What is Robots.txt?
Robots.txt is the text file that can be used as the source to block or intimate search bots not to crawl particular set of file and folders of the website which are declared in same.
User Agent : *
The above syntax “User agent: * “ means that it allows all the search engine bots to crawl the website whereas Disallow:/ indicates the complete set of files and folders of the website will be not crawled by any of the search engines. The text file can have more number of user agents as well as directory path of files and folder.
Below are few of the examples of robots.txt
1) If you want to block complete website using robots.txt file then use below syntax.
User Agent : *
2) If you want not to allow particular folders of the website then use the below synatx
3) If you want to block URL of the website then use the below syntax
4) If you want to single bot to crawl the website then use the below syntax:
User Agent: bing
5) If you want to declare sitemap in robots.txt then use the below syntax
How does the robots.txt file work?
The way robots.txt works is different, as it is mainly used by search engines to understand which of the pages needs to crawl and which should not, Once the crawler starts crawling your website, it would certainly look out for the robots.txt to get an overview of the URLs to be crawler of the website.
How to create Robots.txt file and where to place it?
As it is the text file, you can open a notepad and save it as robots.txt(it is case sensitive, if something goes wrong it will not work), after which include all the files and folders that needs to be part of the text file for the crawls to understand the list of URLs not be crawled. Once it is created place the same under root directory of the website, after which you can access the same publicaly by typing as www.domainname.com/robots.txt
Terms used while declaring in robots.txt
User-agent: It indicates that which of web crawler that instructing to crawl the pages
Disallow: It indicates that which of files and folders to be blocked from crawling
Allow: It indicates that which of the files and folders to be crawled
Sitemap: We can also include sitemap in the robots.txt file for search engines to crawl.
Crawl Delay: It indicates how much of the milliseconds should the crawler wait before loading and crawling page content.