Articles » Spider-Friendly Webpage » Having the robots.txt File in Shape
Having the robots.txt File in Shape
Robots.txt files tell the search engine spiders about the pages not to be indexed.
It does not imply that it helps the search engine to spider a site.
Search engine optimization (SEO) is the technique to identify the reasons for
difficulty in indexing. We have to ask a search engine to skip indexing of some
pages due to time and space constraints.
Which pages? Some times the shopping cart gets indexed. Here is the need to
tell the search engine to skip these files. Rather one would want to index the
sales or information pages.
Some sensitive data and contents in the cgi-bin are the ones one would want
to be skipped as well. The robots.txt file helps in hiding them from the search
engine.
This way the search engine resources are utilized to find important files and
also it makes you safer from site-hackers because the search engine has a penchant
for indexing anything comes into its purview including the password files.
If the robots.txt files are accidentally setup, then it prevents the search
engine from crawling into the site. What are the instructions? Those are surprisingly
few as follows :
User-agent: *
Disallow: /
Correct steps are to be taken to set up the robots.txt file or otherwise the
chances of your business gets remote, virtually non-existent.
Useful information
The Robots Exclusion Protocol
A Web site administrator can define which parts of the site should not be visited by a robot, by providing a specially formatted file on their site, in http://.../robots.txt
The Robots META tag
A Webmaster could indicate if a page may or may not be indexed, or analysed for links, through the use of a special HTML META tag.
Search Engine Bot considers this information when making decision to index particular page. The most popular bots implement this feature.
Robots.txt tools
Google Sitemaps build-in robots.txt analysis tool
After registration at Google Sitemaps tool you can test URLs against your robots.txt file to get know if particular Web page is excluded by Googlebot.
|