What Is Robots.txt And How Is It Useful For Blogger (BlogSpot)?
Search engine like Google, Bing, Yahoo sends spiders or crawlers a kind of program that travels all around the web. When these crawlers or web spiders reach your blog/site they first go through your robots.txt file to check any restricted in your robots.txt file before crawling and indexing your pages.
Robots.txt is a text file which helps the bots/crawlers of the search engines such as Google, Bing, Yahoo to crawl and index your blog/site effectively. It is a set of instructions or rules, which the crawlers and search engine bots visiting your site. It tells search engine bots to crawl which blogger blogspots/pages, web pages, or links should be indexed or not be indexed in search results. It means you can restrict search engine bots to crawl some blogger blogsposts/pages, web pages or links of your website or blog not to be index.
The Robots.txt file is also used by crawlers from other types of websites such as social media sites like Facebook, Twitter and SEO sites like online keyword research tools.
Each Blogger blog will have a robots.txt file that comes by default and it looks something like the one below. You can check your blogs robots.txt file by adding /robots.txt next to your domain name. (https://www.yoursite.blogspot.com/robots.txt) or (https://www.yoursite.com/robots.txt).. Generate Robots.txt file for your blog using Custom Robots.txt Generator Tool For Blogger
In this robots.txt, you can also write the location of your sitemap file. A Sitemap is a file located on a server that contains all posts, pages, permalinks of your website or blog. Mostly sitemap is found in XML format, i.e., sitemap.xml.
Mediapartners-Google – Media partner Google is the user agent for Google adsense that is used to server better relevant ads on your blogger blog/site based on your content. So if you disallow this you will won’t able to see any ads on your blocked pages.
User-agent:* – The user-agent that is marked with (*) asterisk is applicable to all crawlers and robots that can be robots, affiliate crawlers or any client software it can be.
Disallow: By adding disallow you are telling robots not to crawl and index the pages. you can see Disallow: /search which means you are disallowing your blogs search results by default. You are disallowing crawlers in to the directory /search that comes next after your domain name. That is a search page like http://yoursite.blogspot.com/search/label/yourlabel will not be crawled and never be indexed.
Allow – Allow: / simply refers to or you are specifically allowing search engines to crawl those pages.
Sitemap: Sitemap helps to crawl and index all your accessible pages and so in default robots.txt you can see that your blogger blog/site specifically allowing crawlers in to sitemaps.
Disallowing any particular pages. Copy the page URL next to your domain name and add it like this Disallow: /p/your-page.html in your robots.txt file.
For Example: Particular Post /2013/04/how-to-create-professional-blogger.html
You might also like: Custom Robots.txt Generator Tool For Blogger & Wordpress
How To Add Custom Robots.Txt File In Blogger Blogspot
How To Remove ?m=1 From Blogger Blogspot URLs
Robots.txt is a text file which helps the bots/crawlers of the search engines such as Google, Bing, Yahoo to crawl and index your blog/site effectively. It is a set of instructions or rules, which the crawlers and search engine bots visiting your site. It tells search engine bots to crawl which blogger blogspots/pages, web pages, or links should be indexed or not be indexed in search results. It means you can restrict search engine bots to crawl some blogger blogsposts/pages, web pages or links of your website or blog not to be index.
The Robots.txt file is also used by crawlers from other types of websites such as social media sites like Facebook, Twitter and SEO sites like online keyword research tools.
Each Blogger blog will have a robots.txt file that comes by default and it looks something like the one below. You can check your blogs robots.txt file by adding /robots.txt next to your domain name. (https://www.yoursite.blogspot.com/robots.txt) or (https://www.yoursite.com/robots.txt).. Generate Robots.txt file for your blog using Custom Robots.txt Generator Tool For Blogger
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: https://www.blogger4ever.net/sitemap.xml
In this robots.txt, you can also write the location of your sitemap file. A Sitemap is a file located on a server that contains all posts, pages, permalinks of your website or blog. Mostly sitemap is found in XML format, i.e., sitemap.xml.
What Are These?
First you need to know about User agent which is a software agent or client software that will act on behalf of you.Mediapartners-Google – Media partner Google is the user agent for Google adsense that is used to server better relevant ads on your blogger blog/site based on your content. So if you disallow this you will won’t able to see any ads on your blocked pages.
User-agent:* – The user-agent that is marked with (*) asterisk is applicable to all crawlers and robots that can be robots, affiliate crawlers or any client software it can be.
Disallow: By adding disallow you are telling robots not to crawl and index the pages. you can see Disallow: /search which means you are disallowing your blogs search results by default. You are disallowing crawlers in to the directory /search that comes next after your domain name. That is a search page like http://yoursite.blogspot.com/search/label/yourlabel will not be crawled and never be indexed.
Allow – Allow: / simply refers to or you are specifically allowing search engines to crawl those pages.
Sitemap: Sitemap helps to crawl and index all your accessible pages and so in default robots.txt you can see that your blogger blog/site specifically allowing crawlers in to sitemaps.
What Posts/Pages Should I Disallow In Blogger?
This question is little tricky and I cannot predict what pages to allow and what to disallow in your Blog. You can disallow pages like privacy policy, Terms & conditions, affiliate links, labels as well as search results and it depends all upon you.How To Disallow Posts/Pages In Blogger Using Robots.txt
You can disallow search engines to crawl and index particular posts or pages in Blogger using your robots.txt file. if you want disallow any particular posts just add Disallow: /year/month/your-post-url.html in your robots.txt file. That is copy your post URL next to your domain name and add it in your robots.txt file.Disallowing any particular pages. Copy the page URL next to your domain name and add it like this Disallow: /p/your-page.html in your robots.txt file.
For Example: Particular Post /2013/04/how-to-create-professional-blogger.html
You might also like: Custom Robots.txt Generator Tool For Blogger & Wordpress
How To Add Custom Robots.Txt File In Blogger Blogspot
How To Remove ?m=1 From Blogger Blogspot URLs
I really appreciate the robots.txt generator you made for blogger and wordpress. I've been struggling with the robots.txt on my blog, and this is going to be a great help.
ReplyDelete