« A Digital Maginot Line? | Main | Sitemaps Protocol Redemption »
Are You Giving Away Links You Don't Know About?
November 15, 2006
Recently, I was looking through a client's list of indexed pages at Yahoo Site Explorer. (Get ready for another "All Hail YSE" post.) I noticed what looked like a lot of junk pages, and I found a site vulnerability that many sites could potentially have.
Link spammers had been attacking the site with an interesting attempt to get more links to their sites:
- Use the site's internal search feature to create a search results page that "searched" for links to the spam sites
- Get my client's site to output a search results page that links to the spam site
- Link to that spammy search results page to get it crawled and indexed
If none of that makes sense, here's an example. Let's say the spammers were trying to create links to Apple Computer (they weren't). They go to your internal search box and type the following:
![]()
...and then hit Submit.
Their goal is that your site outputs a search results page that includes text showing the search term. For example, this is what they want the search results page to say:
Search Results for iPod stuff
Next, they link to the page from their own site (or some site in their ugly network) and it gets crawled and indexed. And voila - they have a new link pointing to their site - from yours.
The spammer's plot failed for several reasons - one of which is that my client's site does not output a heading (or any text) that lists the search term.
But many sites do. So be careful and make sure that if you have a internal search engine that outputs unique search URLs that contain the query string, that someone's not indexing more of your site than you'd like.
Based on the client's unique needs, fixing the issue isn't as easy as you might think. We're looking for ways to ensure that this doesn't happen in the future, including some creative uses of robots.txt, changing form methods, and some contact with Yahoo.
All posts by Erik Dafforn
posted by Erik Dafforn at November 15, 2006 05:12 PM
Intrapromote: [ Case studies | SEO services | Bios ]
Trackback Pings
To TrackBack this entry, use the following URL:
http://seoblog.intrapromote.com/mt-tb.cgi/327
Comments
I have heard of this happening before, and I still think it is somewhat clever. This is also why it is so important to sanitize incoming content when developing a Web application.
In this case, converting the quotation marks to their HTML equivalents in output (i.e., ") would have prevented the hyperlink creation altogether, as it would have been rendered literally, and not as a hyperlink.
Posted by: Trey at November 23, 2006 08:01 AM

