News publishers and the robots.txt file

Ok so this week, for those that missed it, there was some news regarding google indexing news publishers sites.
We all know that if you do not want google to index your website, you simply write in the code:

User-agent: *
Disallow: /

But for some reason the news publishers think that all their content is going to be indexed by the search engine. According to other blogs, news publishers want to charge google for access to their sites. They want google to pay them to index their site. But of course this will not happen. They know they can block the crawler easily.

Search engines have always checked for permissions before crawling through pages from a web site. Webmasters, including news publishers, are aware and use the Robots Exclusion Protocol (REP) to tell search engines whether or not their sites, or a web page, can be crawled.

Leave a Reply

Your email address will not be published. Required fields are marked *