« Ten Reasons Why Good Link Builders Fail | Main | Why Good Link Builders Fail: Reason One »

Google Indexes Its Own Toolbar Content(?)

November 23, 2009

Erik Dafforn

I don't think this is a particularly big deal, but I am fascinated by crawler behavior and the wheres and whys of crawlers not honoring sites' specific robots directives.

And it makes it even more interesting when the robot and the site belong to the same company.

A few weeks ago, I was trying to find out exactly when Google overtook Yahoo in the race for search engine market share. (It's not important why, but it will help you understand why I was searching for such an odd phrase.)

I ended up searching for this query:

["google passes yahoo" "search market share" 2004]

And the results page looked like this:


Google SERP for [


If you click over, you can clearly see that we're in the /archivesearch portion of the toolbar.google.com site:


The URL we land on falls in the /archivesearch directory of the Toolbar site.


If you go to the Google Toolbar site's robots.txt file, however, you'll see that this portion is supposed to be off-limits to Googlebot:


A selected portion of the Google Toolbar site's robots.txt file.

(Note: This robots.txt file also has certain "allow" commands, but none that should pertain to this particular page.)

But wait. Couldn't this just be an "uncrawled reference" -- that rare-but-easily-recreated instance where Google indexes pages based on incoming links, but doesn't actually crawl the page, so therefore still honors the robots.txt exclusion protocol?

No, I don't think so, at least in this case. Uncrawled references are generally don't have snippets attached to them, and if you look at the SERP above, you'll see a snipped pulled from deep within the actual page:


A portion of the page from the Google Toolbar site from which its snippet is pulled.


I'm not claiming to know each subtle nuance of uncrawled references, but I study robots exclusion pretty closely, and this is the first instance I've seen of a section from within an excluded page being used as its snippet.

I'm certainly willing to concede that Google just happened to find this information somewhere else and attribute it to this page, but part of me making that concession is someone proving that it actually happened. I'm not tied to any particular outcome; I'd just like to learn more about why this happens.

All posts by Erik Dafforn
posted by Erik Dafforn at November 23, 2009 5:22 PM
Intrapromote: [ Case studies | SEO services | Bios ]

Printer-friendly version

Trackback Pings

To TrackBack this entry, use the following URL:
http://seoblog.intrapromote.com/mt-tb.cgi/623

Comments

A couple colleagues at Google checked into this, and we crawled the page before the robots.txt directive was put in place. When we try to recrawl that page we'll see that it's blocked in robots.txt and won't crawl it after that.

Posted by: Matt Cutts at November 24, 2009 11:12 AM

Great article, this is some intresting stuff i would like to find out more regarding this and be intrested in peoples comments.

Posted by: Kenneth at November 24, 2009 12:19 PM

Really good article about google toolbar. I learn a lot from here.

Posted by: PPC campaigns at May 7, 2010 7:17 AM

U wilt geld lenen zonder BKR toetsing? De opties hiervoor worden groter, kijk verder en ontdek hoe u wél geld kunt lenen, snel & eenvoudig.

Posted by: lenen zonder bkr toetsing at August 3, 2010 4:23 PM

Hypotheken? Heel veel hypotheek informatie: verschillende hypotheekvormen, hypotheekrentes, nationale hypotheek garantie, hoe een hypotheek te vergelijken.

Posted by: hypotheek at August 8, 2010 6:51 PM

Hypotheken? Heel veel hypotheek informatie: verschillende hypotheekvormen, hypotheekrentes, nationale hypotheek garantie, hoe een hypotheek te vergelijken.

Posted by: hypotheek at August 8, 2010 8:20 PM

I liked reading this, like your blog design too. Is it wordpress?

Posted by: Forum Avatars at August 12, 2010 3:07 PM

Finally, a good post about this subject! I can not believe I had to go thru like 20 blogs just to come across this excellent post. All their content together still can not equal the content of this article. I have learned a lot and will bookmark your web page. Maintain up the good job!

Posted by: Joe at August 14, 2010 1:06 AM

This is some info! Didn't know about all these stuff till now. Thanks for posting it.

Posted by: giken at August 15, 2010 9:21 PM

Incredible!!! Bookmarked this page that has this extremely good references. Will come back to see if there are any updates. You, the author, are a master. Thanks

Posted by: never fail list building system at August 22, 2010 6:33 PM

You genuinely know your stuff. Genuinely wish I've read this sooner! I feel so ignorant haha.

Posted by: hot grils at August 25, 2010 9:10 PM

Hmm.!.. it is a post I'm willing to take a bullet for. Undoubtedly hits the mark. I have some minor concerns but I don't want to begin a long post and someone could possibly flame me. Just wish to keep this blog civil and clean. Wouldn't like any hatemail would i? lol Keep it up!

Posted by: send money to philippines at August 29, 2010 3:54 AM

You genuinely know your stuff. Genuinely wish I've read this sooner! I feel so ignorant haha.

Posted by: Beni at August 29, 2010 11:25 AM

I like the structure of the posts. Basic and straight to the point. I bet you are able to even do better. Write a lengthy article and show us what you are able to do. I have no doubt you'll create even better information. I have subscribed to a lot of blogs but this one is really a keeper!

Posted by: Palawan Underground River at August 30, 2010 2:25 AM

Post a comment




Remember Me?

(you may use HTML tags for style)

Copyright 2005-2008 Intrapromote, LLC