What is a robots.txt file?
Robots.txt file is a normal txt file that includes a few or some text to give directions to search engine bots. Generally when a search engine bot goes to visit a web-page URL such as http://example.com before visiting, it visits robots.txt file to get permission to crewel the URL or not. Though robots.txt file cannot force to follow this file by any robot, but maximum robot honor a robots.txt file except some spamming bots and malware bots that scan the web for security vulnerabilities. Robots.txt file plays very significant role for On-Page SEO, so you need to understand it perfectly.
How to create a robots.txt file?
We can use any txt file generator like notepad or word pad to create this file. You just need to write some directive lines inside this txt file and save it as txt file such as robots.txt. It should be mentioned here that you must use lower case character in the file name and do not use upper case character like Robots.txt.
Example #1. A simple robots.txt file:
In the example given above, first line User-agent:* means that directions are applicable for all the search engines bots or robots. And, 2nd line Disallow: means all the search engines bots are allowed to crewel and index this site.
Example #2. Disallow to crewel entire site:
In example #2, second line Disallow:/ means all the search engines bots are not permitted crawling and indexing this site.
Example #3. To disallow a singe robot:
In this example, all bots are permitted to crewel the site excluding Googlebot that is Googlebot is not permitted crawling here.
Example #4. To allow a singe robot only:
User-agent:Googlebot Disallow: User-agent:* Disallow:/
In the example given above, all other bots are disallowed to crewel this site excluding Googlebot. That is only Googlebot is permitted.
In our next example we will show how can you prevent bots to crewel a specific directory or file of your site. Suppose your site directory folder name is tutorial and it contains several HTML file named abc.html, xyz.html and 123.html. Now you may not want to allow bots to crewel this folder or those files. Command will be as follows:
Example #5. Disallow to crewel a specific directory or files:
User-agent:* Disallow:/tutorial/ OR User-agent:* Disallow:/tutorial/abc.html Disallow:/tutorial/xyz.html Disallow:/tutorial/123.html
Example #6. Disallow a specific directory except one file.
User-agent:* Disallow:/tutorial/ Allow:/tutorial/xyz.html
In the example given above, it gives direction to disallow “tutorial” folder, but allow xyz.html file only from that folder.
Some time your website may create question mark (?) URL. It is very known to them who works with WordPress, because WordPress create this type of URL by default. You may block access to all URLs that include a question mark (?) following this command below:
Using robots meta tag:
You know meta tags are used in head section of a HTML page. By default even if you do not use meta robot tag it means “index, follow” that is search bots are permitted to follow and index this page. You should use meta robots tag in every page of your website to define about specific rules and regulation of each page. You can allow or disallow which link will be permitted to index and followed or not by using this meta tag. Meta robots tag may be used as following ways:
<meta name="robots" content="noindex, follow"> <--Disallowed to index, links may be followed--> or <meta name="robots" content="noindex, nofollow"> <--Disallowed to index and links may not be followed-->
Main difference between meta tag and robots.txt files are:
- Robots.txt file is used to control bots activities and performances for the entire site.
- Meta robots tag are used to define individual page performance rules for a page only.
Here one thing should be mentioned that if you use robots meta tag as “no-index, no-follow” and at the same time use robots.txt file disallowing subjected URL then meta tag may not work because bots will not come to visit this page to know about meta robots tag.
Some important matters to be considered in using robots.txt file:
- It must be placed in root directory of your site. For example: http://example.com/robots.txt
- This file is very much case sensitive, such as: If you have two files named file.html & File.html then command on robots.txt file Disallow:/file.html will only disallow this file, but will not make any effect on File.html. So be careful.
- It is better to have a robots.txt file in each site. Even if it is without any text inside that is empty one or with default direction such as User-agent:* and Disallow:
- After using robots.txt file saying not to crewel a specific file, Google may show that URL only in search result page following links from external source. To completely block that URL you may use robots no-index meta tag in head section of your HTML page.
- Robots.txt file is a publicly available file. Anyone can see it to know which sections of your site are allowed or disallowed by using robots.txt file.
You can learn more about robots file from Google support.