What is Robots.txt and its importance from SEO Prospective

What is Robots.txt?

Robots.txt is the text file that can be used as the source to block or intimate search bots not to crawl particular set of file and folders of the website which are declared in same.

User Agent : *
Disallow: /

The above syntax  “User agent: * “ means that it allows all the search engine bots to crawl the website whereas Disallow:/ indicates the complete set of files and folders of the website will be not crawled by any of the search engines. The text file can have more number of user agents as well as directory path of files and folder.

robots.txt                                          Image: webshoot.pl

Below are few of the examples of robots.txt

1) If you want to block complete website using robots.txt file then use below syntax.
User Agent : *
Disallow: /

2) If you want not to allow particular folders of the website then use the below synatx
User Agent:*
Disallow: /temp

3) If you want to block URL of the website then use the below syntax
User Agent:*
Disallow: /sale/products.html

4) If you want to single bot to crawl the website then use the below syntax:
User Agent: bing
Disallow: /

5) If you want to declare sitemap in robots.txt then use the below syntax
Sitemap: http://www.domain.com/sitemap-products.xml

How does the robots.txt file work?
The way robots.txt works is different, as it is mainly used by search engines to understand which of the pages needs to crawl and which should not, Once the crawler starts crawling your website, it would certainly look out for the robots.txt to get an overview of the URLs to be crawler of the website.

How to create Robots.txt file and where to place it?
As it is the text file, you can open a notepad and save it as robots.txt(it is case sensitive, if something goes wrong it will not work), after which include all the files and folders that needs to be part of the text file for the crawls to understand the list of URLs not be crawled. Once it is created place the same under root directory of the website, after which you can access the same publicaly by typing as www.domainname.com/robots.txt

Terms used while declaring in robots.txt
User-agent: It indicates that which of web crawler that instructing to crawl the pages
Disallow: It indicates that which of files and folders to be blocked from crawling
Allow: It indicates that which of the files and folders to be crawled
Sitemap: We can also include sitemap in the robots.txt file for search engines to crawl.
Crawl Delay: It indicates how much of the milliseconds should the crawler wait before loading and crawling page content.

No Comments Yet

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>