SEO Speedwagon

A Guide to Robots Exclusion Protocol

Google's Prashanth Koppula wrote a ready-to-bookmark post over at the official Webmaster Tools blog, showing tons of different robots-exclusion protocol (REP) directives that can be implemented in various ways. Following is a listing of directives discussed and the methods of implementation:

Directives for the robots.txt file:

  • Disallow
  • Allow
  • $ Wildcard
  • * Wildcard
  • Sitemaps location

Meta tags for insertion into HTML:

  • NOINDEX
  • NOFOLLOW
  • NOSNIPPET
  • NOARCHIVE
  • NOODP

Of special note are the two different wildcard uses; the post links to usage models for each. One additional funny bit is in the explanation of NOARCHIVE, in which the post describes the tag's usage as "Do not make available to users a copy of the page from the Search Engine cache." Contrast this with "Do not cache the page," which I believe is most people's idea of the tag's effect. I love little semantic hooks like that.

The post notes that the directives above are observed by Google, Yahoo, and MSN/Live, which is a nice bonus. In addition, the post discusses some directives that only Google honors, such as UNAVAILABLE_AFTER (which I discussed about a year ago), NOIMAGEINDEX, and NOTRANSLATE.

I appreciate what engines are doing with the REP advancements. It's the equivalent of the basic Robotstxt.org protocol being the vehicle, but the engines have become after-market accessory specialists, showing you how to get additional mileage, power, and stunts out of your car.

Copyright 2005-2007 Intrapromote, LLC