« 10 Tips for Doing Link Development Research | Main | Why .edu Link Acquisition “Services” Don’t Last Long »
Error in Google's robots.txt Docs
October 20, 2008
Update: This was fixed rapidly; see Riona's comment.
I don't want to get too deep into the complexities of robots.txt parsing (if you want that, try this, this or this), but I found something odd at the bottom of this page, one of Google Webmaster Help's many pages on robots.txt.
The page says:
URLs are case-sensitive. For instance, Disallow: /private_file.asp would block http://www.example.com/junk_file.asp, but would allow http://www.example.com/Junk_file1.asp.
Here's a picture just so you trust me:

This is wrong in a lot of different ways. Let's look at them with my comments following in bold.
URLs are case-sensitive.So far, so good.
For instance, Disallow: /private_file.asp would block http://www.example.com/junk_file.aspIt would? How?
..., but would allow http://www.example.com/Junk_file1.asp.
I suppose Disallow: /private_file.asp would allow /Junk_file1.asp, but not because of capitalization style. It's because /Junk_file1.asp has nothing to do with the excluded file, /private_file.asp
So what did they mean? If they're anything like me, this was a paragraph started, edited a few times, and never really finished. It appears to try to cover a variety of the issues covered on the page, including cap style, pattern matching, and wildcard characters. Here are a couple alternatives I'd suggest:
URLs are case-sensitive. For instance, Disallow: /private_file.asp would block http://www.example.com/private_file.asp, but would allow http://www.example.com/Private_file.asp.
or, to continue along the pattern-matching theme also discussed on the page, this would work:
URLs are case-sensitive. For instance, Disallow: /private_file*.asp would block http://www.example.com/private_file.asp, but would also block http://www.example.com/private_file1.asp. It would not, however, block /Private_file1.asp.
This is a pretty minor detail at the bottom of an esoteric page, but if you're looking for specific information on cap style and robots.txt, it could cause some head-scratching.
All posts by Erik Dafforn
posted by Erik Dafforn at October 20, 2008 07:30 AM
Intrapromote: [ Case studies | SEO services | Bios ]
Trackback Pings
To TrackBack this entry, use the following URL:
http://seoblog.intrapromote.com/mt-tb.cgi/581
Comments
Good catch! Sad that the help docs on Google seem to be lacking in the care that they desperately need. Maybe a bit of feedback will get their behinds in gear...
Posted by: David Millar at October 20, 2008 10:01 AM
Thanks for the note Dave; I try to balance my desire for accuracy with the knowledge that they don't really owe anyone any sort of documentation. In general, I think they're very good at communication; finds like this are pretty rare.
Posted by: Erik at October 20, 2008 12:23 PM
Hey Erik,
Seems like something indeed went a bit awry here, and thanks very much for the heads-up!
Do note, though, that we typically can't update pages immediately; due to our interests in keeping documents in sync across the dozens of languages we support and that sort of thing, we tend to update our docs in batches.
But I'll point this out to our Webmaster Help folks right away and trust that they'll patch things up as soon as they can.
Thanks again for the note.
Posted by: Adam Lasnik at October 20, 2008 02:02 PM
Hi Erik! I write the content for the Webmaster Tools Help Center. Thanks for pointing this out! I've updated the page and you should be able to see the changes now.
Best wishes
Posted by: Riona at October 20, 2008 07:25 PM
I understand what they're triyng to say however, they should really finish their documents and pages before putting them online.
However, if its anything like their software products, it'll never reach a finished state and always be in BETA
Posted by: Dave - Google SEO Blog Tips at October 23, 2008 07:48 PM

