Google Indexes Its Own Toolbar Content(?)
I don't think this is a particularly big deal, but I am fascinated by crawler behavior and the wheres and whys of crawlers not honoring sites' specific robots directives.
And it makes it even more interesting when the robot and the site belong to the same company.
A few weeks ago, I was trying to find out exactly when Google overtook Yahoo in the race for search engine market share. (It's not important why, but it will help you understand why I was searching for such an odd phrase.)
I ended up searching for this query:
["google passes yahoo" "search market share" 2004]
And the results page looked like this:

If you click over, you can clearly see that we're in the /archivesearch portion of the toolbar.google.com site:
![]()
If you go to the Google Toolbar site's robots.txt file, however, you'll see that this portion is supposed to be off-limits to Googlebot:

(Note: This robots.txt file also has certain "allow" commands, but none that should pertain to this particular page.)
But wait. Couldn't this just be an "uncrawled reference" -- that rare-but-easily-recreated instance where Google indexes pages based on incoming links, but doesn't actually crawl the page, so therefore still honors the robots.txt exclusion protocol?
No, I don't think so, at least in this case. Uncrawled references are generally don't have snippets attached to them, and if you look at the SERP above, you'll see a snipped pulled from deep within the actual page:

I'm not claiming to know each subtle nuance of uncrawled references, but I study robots exclusion pretty closely, and this is the first instance I've seen of a section from within an excluded page being used as its snippet.
I'm certainly willing to concede that Google just happened to find this information somewhere else and attribute it to this page, but part of me making that concession is someone proving that it actually happened. I'm not tied to any particular outcome; I'd just like to learn more about why this happens.