17 July 2009

Spiders Explained

Basically all search engine spiders function on the same principle – they crawl the Web and index pages, which are stored in a database and later use various algorithms to determine page ranking, relevancy, etc of the collected pages. While the algorithms of calculating ranking and relevancy widely differ among search engines, the way they index sites is more or less uniform and it is very important that you know what spiders are interested in and what they neglect.

Search engine spiders are robots and they do not read your pages the way a human does. Instead, they tend to see only particular stuff and are blind for many extras (Flash, JavaScript) that are intended for humans. Since spiders determine if humans will find your site, it is worth to consider what spiders like and what don't.

Flash, JavaScript, Image Text or Frames?
Flash, JavaScript and image text are NOT visible to search engines. Frames are a real disaster in terms of SEO ranking. All of them might be great in terms of design and usability but for search engines they are absolutely wrong. An incredible mistake one can make is to have a Flash intro page (frames or no frames, this will hardly make the situation worse) with the keywords buried in the animation. Check with the Search Engine Spider Simulator tool a page with Flash and images (and preferably no text or inbound or outbound hyperlinks) and you will see that to search engines this page appears almost blank.

Running your site through this simulator will show you more than the fact that Flash and JavaScript are not SEO favorites. In a way, spiders are like text browsers and they don't see anything that is not a piece of text. So having an image with text in it means nothing to a spider and it will ignore it. A workaround (recommended as a SEO best practice) is to include meaningful description of the image in the ALT attribute of the tag but be careful not to use too many keywords in it because you risk penalties for keyword stuffing. ALT attribute is especially essential, when you use links rather than text for links. You can use ALT text for describing what a Flash movie is about but again, be careful not to trespass the line between optimization and over-optimization.

Are Your Hyperlinks Spiderable?

The search engine spider simulator can be of great help when trying to figure out if the hyperlinks lead to the right place. For instance, link exchange websites often put fake links to your site with _javascript (using mouse over events and stuff to make the link look genuine) but actually this is not a link that search engines will see and follow. Since the spider simulator would not display such links, you'll know that something with the link is wrong.

It is highly recommended to use the <'noscript'> tag, as opposed to _javascript based menus. The reason is that _javascript based menus are not spiderable and all the links in them will be ignored as page text. The solution to this problem is to put all menu item links in the <'noscript'> tag. The <'noscript'> tag can hold a lot but please avoid using it for link stuffing or any other kind of SEO manipulation.

If you happen to have tons of hyperlinks on your pages (although it is highly recommended to have less than 100 hyperlinks on a page), then you might have hard times checking if they are OK. For instance, if you have pages that display “403 Forbidden”, “404 Page Not Found” or similar errors that prevent the spider from accessing the page, then it is certain that this page will not be indexed. It is necessary to mention that a spider simulator does not deal with 403 and 404 errors because it is checking where links lead to not if the target of the link is in place, so you need to use other tools for checking if the targets of hyperlinks are the intended ones.

Meta Keywords and Meta Description

Meta keywords and meta description, as the name implies, are to be found in the <'META'> tag of a HTML page. Once meta keywords and meta descriptions were the single most important criterion for determining relevance of a page but now search engines employ alternative mechanisms for determining relevancy, so you can safely skip listing keywords and description in Meta tags (unless you want to add there instructions for the spider what to index and what not but apart from that meta tags are not very useful anymore).
contributed by, Abhishek SEO

No comments:

Post a Comment