November 20, 2008

Google Lets Users Promote, Remove, Comment on Listings Erik Dafforn

posted by Erik Dafforn in category: Google

This has been discussed for a few months here and there, but this is the first time I've seen it in the wild. Google SERPs are giving users the ability to "promote," "remove," or "comment" on listings:

New buttons in SERPs enable user feedback

Here's a closeup of the three. See if you can figure out which is Promote, Remove, or Comment:

google-serp-promote-remove-comment02.jpg

I've only begun to play with them, so I have no idea what the implications are. I suspect that like with most things, Google will harness the data and use it in aggregate to try to improve relevance of results. I'm sure we'll read more about that in the next couple days, along with the imminent speculation about "what it all means," which, in the grand scheme, is usually very little. Still, it's cool.

Google Lets Users Promote, Remove, Comment on Listings
Posted by Erik Dafforn at 10:32 PM | Comments (0) | TrackBacks (0)
Printer-friendly version

Google SERPs Showing MySpace + other Videos Erik Dafforn

posted by Erik Dafforn in category: Universal SEO

I'm surely not the first to notice this, but I saw MySpace video thumbnails in Google SERPs for the first time today:

MySpace video thumbnail alongside a YouTube video

Looking around, G is pulling from multiple sources, including MetaCafe, CollegeHumor, and this example from Spike:

Great result? Or the GREATEST result?

A couple months ago, AccuraCast noticed two video results in a horizontal line, but in that sample, both videos were from Google-owned YouTube.

This is the next logical step in the universality of Universal Search, so to speak. Is it also the beginning of the end of big corporate presence on shared video sites?

Google SERPs Showing MySpace + other Videos
Posted by Erik Dafforn at 06:32 PM | Comments (0) | TrackBacks (0)
Printer-friendly version

November 07, 2008

Social Media Reality Check: Facebook vs. MySpace Erik Dafforn

posted by Erik Dafforn in category: Social Media

Submitted without comment:

facebook-v-myspace.jpg

Social Media Reality Check: Facebook vs. MySpace
Posted by Erik Dafforn at 09:26 AM | Comments (2) | TrackBacks (0)
Printer-friendly version

October 28, 2008

Follow Intrapromote on Twitter Erik Dafforn

posted by Erik Dafforn in category: Social Media

We've been using Twitter as an internal communications tool for a while as a "protected" feed. In the spirit of TwitterGlasnost, however (and because we were surprised that several people found the feed and requested to follow it), we want to open it up.

What's in the stream?

  • Links to posts from this blog
  • Links to other SEO-related posts and articles from Intrapromote staffers
  • SEO/M "required reading" -- a list of important SEO/SEM-related articles from around the web that our staff members have shared with one another
  • Any upcoming speaking gigs or seminars we'll be attending
  • The obligatory, enigmatic "and anything else we can think of..." items
So you are cordially invited to follow @intrapromote. No RSVP required.

Follow Intrapromote on Twitter
Posted by Erik Dafforn at 08:28 AM | Comments (1) | TrackBacks (0)
Printer-friendly version

October 20, 2008

Error in Google's robots.txt Docs Erik Dafforn

posted by Erik Dafforn in category: Crawling and Indexing

Update: This was fixed rapidly; see Riona's comment.

I don't want to get too deep into the complexities of robots.txt parsing (if you want that, try this, this or this), but I found something odd at the bottom of this page, one of Google Webmaster Help's many pages on robots.txt.

The page says:

URLs are case-sensitive. For instance, Disallow: /private_file.asp would block http://www.example.com/junk_file.asp, but would allow http://www.example.com/Junk_file1.asp.

Here's a picture just so you trust me:

google-robotstxt-text.jpg

This is wrong in a lot of different ways. Let's look at them with my comments following in bold.

URLs are case-sensitive.
So far, so good.
For instance, Disallow: /private_file.asp would block http://www.example.com/junk_file.asp
It would? How?
..., but would allow http://www.example.com/Junk_file1.asp.

I suppose Disallow: /private_file.asp would allow /Junk_file1.asp, but not because of capitalization style. It's because /Junk_file1.asp has nothing to do with the excluded file, /private_file.asp

So what did they mean? If they're anything like me, this was a paragraph started, edited a few times, and never really finished. It appears to try to cover a variety of the issues covered on the page, including cap style, pattern matching, and wildcard characters. Here are a couple alternatives I'd suggest:


URLs are case-sensitive. For instance, Disallow: /private_file.asp would block http://www.example.com/private_file.asp, but would allow http://www.example.com/Private_file.asp.

or, to continue along the pattern-matching theme also discussed on the page, this would work:

URLs are case-sensitive. For instance, Disallow: /private_file*.asp would block http://www.example.com/private_file.asp, but would also block http://www.example.com/private_file1.asp. It would not, however, block /Private_file1.asp.

This is a pretty minor detail at the bottom of an esoteric page, but if you're looking for specific information on cap style and robots.txt, it could cause some head-scratching.

Error in Google's robots.txt Docs
Posted by Erik Dafforn at 07:30 AM | Comments (6) | TrackBacks (0)
Printer-friendly version

September 19, 2008

Linking External and Internal Search Terms in Google Analytics Erik Dafforn

posted by Erik Dafforn in category: User Behavior

Have you ever wanted to match up internal search terms (i.e., terms that people searched for from your site's internal search feature) with their corresponding external search terms (i.e., terms that people used to find your site in the first place)?

In Google Analytics you can, and while finding the information is not particularly intuitive the first time, it's pretty quick once you know how to do it.

First, of course, you have to set up Site Search, which simply amounts to identifying your site's specific search parameter for Google Analytics so it can scrape the query terms out of your site's search results URLs. Once you've done that (and have begun to gather data for a little while), you're ready to go.

First, drill down to the Content | Site Search | Search Terms report, as shown here:

Finding internal search terms in Google Analytics

This shows you all the terms that people searched for on your site, from within your own internal search feature, in the given time period. Pick a term and click it, as shown here:

The list of internal search terms

The resulting screen is the Search Term Overview, which tells you how many people searched for that term, etc. From the Dimension drop-down list, select Keyword, as shown below. This tells Google Analytics to report which external keyword was used by the visitor(s) who eventually searched for "404 redirect" (or whatever search term you selected).

The report for a specific internal search term

The resulting screen will list the keyword that the user searched for at a search engine to first arrive at your site. In this case, the user searched for "seo using 404 301," as shown here:

site-search-04.jpg

If you have a popular search term on your site, the image above would likely be populated with several different external search terms. In this example, however, only one person searched for "404 redirect" on the site in the time period, so there's only one external search phrase that drove the traffic. To find the referring engine, select Source instead of Keyword from the Dimension drop-down.

Exactly what to do with this data is the topic for a separate post, which I hope to have ready soon.


Linking External and Internal Search Terms in Google Analytics
Posted by Erik Dafforn at 11:50 AM | Comments (0) | TrackBacks (0)
Printer-friendly version

August 23, 2008

Intrapromote Welcomes Angela Moore as Director of Link Development Erik Dafforn

posted by Erik Dafforn in category: Link Building

I wanted to let everyone know how happy we are with a new addition to our staff. Angela Moore has joined us as Director of Link Development, a position that we built around her significant experience and skills. She'll be managing a team and will really broaden the scope of our link building services. We have already seen great things and expect that to continue.

Here's the release. Angela is also a mod at SEW Forums and is already a veteran blogger, so keep an eye on our link-building category (& feed). Welcome, Angela.

Intrapromote Welcomes Angela Moore as Director of Link Development
Posted by Erik Dafforn at 08:26 AM | Comments (0) | TrackBacks (0)
Printer-friendly version

August 18, 2008

Twitter and the "Black Box" of Reputation Management Erik Dafforn

posted by Erik Dafforn in category: Social Media

I keep reading stories about social media sites like Twitter and how they're revolutionizing customer service. Comcast. H&R Block. Southwest Airlines. On and on. (The organic tie-in here is that for many companies, pages like Twitter profiles -- as well as the news stories that discuss them -- are already showing up on SERPs for company names, and that's going to continue for a while.)

All this is great, of course. But at the same time, it reminds me of an old joke I'm sure you've heard. If an airplane's "black box" is the single, indestructible element of the plane that is nearly always recoverable after a crash, why don't they just make the whole plane out of black box?

Silly, I know. But similarly, if using something like Twitter is the perfect, efficacious form of customer service we've all been waiting for, why is it the exception instead of the rule? Why do companies frequently use social media to apologize for more traditional forms of customer service that garner complaints, instead of propagating these rapid-response techniques across their traditional customer service and support environments? It's a cynical perspective, but I think one reason that Twitter users get quick reaction and kid-glove treatment is that their complaints "have legs." In other words, they're being broadcast to the world, not just to the company. If a company doesn't respond to your forum post or answer your email, yet they respond to your Tweet in 12 minutes, part of you should be happy, and part of you should be angry. You're being addressed because your method of complaint has the most potential to harm them.

If a company had a queue set up so that any 800 call or support forum post that languished unanswered for 24 hours was re-broadcast as a press release, now THAT would be some accountability.

Twitter and the "Black Box" of Reputation Management
Posted by Erik Dafforn at 03:12 PM | Comments (0) | TrackBacks (0)
Printer-friendly version

August 08, 2008

The Difference Between Crawling and Indexing Erik Dafforn

posted by Erik Dafforn in category: Crawling and Indexing

I talk a lot about crawling and indexing (to the point that we have a dedicated category), but I think it's worthwhile to back up and describe some of what's going on.

The terms crawling and indexing (and indexing's cousin, caching) are frequently used together, but you should not consider them synonyms.

Exact definitions probably differ from person to person, but following is how I explain the processes:

Crawling is the process of an engine requesting -- and successfully downloading -- a unique URL. Obstacles to crawling include no links to a URL, server downtime, robots exclusion, or using links (such as some JavaScript links) from which bots cannot find a valid URL.

Indexing is the result of successful crawling. I consider a URL to be indexed (by Google) when an info: or cache: query produces a result, signifying the URL's presence in the Google index. Obstacles to indexing can include duplication (the engine might decide to index only one version of content for which it finds many nearly identical URLs), unreliable server delivery (the engine may decide to not index a page that it can access during only one-third of its attempts), and so on.

What's the difference between crawling and indexing, in terms of time? Here's a recent example. I recently watched a newly introduced URL to see when it would be indexed. I monitored the text cache query of the URL every four hours starting when the URL went live on July 2. (This URL was one of a number of URLs linked to on a new site map.)

On July 17, the text cache showed results and finally stopped saying "Your search - cache:[URL] - did not match any documents." But what was interesting is that the cached file showed the results of the URL "as retrieved on 8 Jul 08." So make special note that the URL was crawled and cached over a week before it appeared in the index.

A better, more comprehensive test would be to watch server logs and see how many times the file was requested, and with what frequency, between the original request date and date at which the cache query showed results. Additional testing would try to detect ways to shorten that time by increasing the number (and prominence) of incoming links and so on.

The Difference Between Crawling and Indexing
Posted by Erik Dafforn at 12:04 PM | Comments (4) | TrackBacks (0)
Printer-friendly version

July 01, 2008

Google to Index Flash Content ... Again Erik Dafforn

posted by Erik Dafforn in category: Crawling and Indexing

In a post last night entitled "Improved Flash Indexing," the Google Webmaster Tools blog reports that

We've improved our ability to index textual content in SWF files of all kinds. This includes Flash "gadgets" such as buttons or menus, self-contained Flash websites, and everything in between. ... In addition to finding and indexing the textual content in Flash files, we're also discovering URLs that appear in Flash files, and feeding them into our crawling pipeline—just like we do with URLs that appear in non-Flash webpages. For example, if your Flash application contains links to pages inside your website, Google may now be better able to discover and crawl more of your website.

This brings up several satellite issues:

  • Since it's been so difficult to index Flash content, a virtual cottage industry sprang up with ways to circumvent that disability, including methods like SWFObject, sIFR, user-agent-based delivery of plain text vs. Flash content, and so on. With these techniques becoming more sophisticated and easy to implement, is it likely that sites will abandon them soon?
  • It appears that for now, Flash files spawned when users fail a JavaScript test will still be uncrawlable, since engines too typically fail a JS sniffer.
  • If you have a SWF file embedded as only a part of a larger HTML page, trust me that you do NOT want only that SWF file being returned in search results. It typically looks awful, lacking both the size requirements you implemented, as well as the critical navigation that resides in your HTML. The Webmaster Central post didn't say that SWF files would be returned in SERPs, so I'm not saying that's what will happen. But I've tested client sites by searching for strings of text that only appear in Flash files, and I've seen it happen. So test with your own site and cross your fingers.

I chose a somewhat sarcastic post title because ever since search engines and Flash have butted heads, the ability for engines to index text embedded in Flash files has been "just around the corner." In 2002, for example, hearts were briefly aflutter about the Macromedia Flash Search Engine SDK, which was going to be the end of engines' inability to index Flash content. Hear that? The end. 2002.

So I enter into this new era with guarded optimism. Optimistic because Google never releases anything "new" until it's been tested in the wild for months or years. Guarded because the "right" recommendation for clients is never quite as black and white as people think it will be.

Google to Index Flash Content ... Again
Posted by Erik Dafforn at 09:18 AM | Comments (2) | TrackBacks (0)
Printer-friendly version

June 27, 2008

Exactly How Accurate IS Google Trends for Websites? Erik Dafforn

posted by Erik Dafforn in category: Web Analytics

Much has been made of the week-old announcement that Google is in the traffic trending game. I weighed in earlier this week at ClickZ, focusing mostly on ways you can benefit from the information and largely sidestepping the already-trodden issues of Google being the only company able to opt out of the reporting, etc.

One question that hasn't been discussed to death, however, is the actual accuracy of the traffic numbers that Google is reporting. I ran some numbers on some sample sites and laid the Google Trends lines over the actual traffic numbers:

Example 1:
google-trends-traffic-overlay-01.jpg

Example 2:
google-trends-traffic-overlay-02.jpg

The verdict? In general, Google doesn't do too awfully bad, especially considering that neither of the sites above use Google Analytics or Urchin to measure their traffic.

The peaks and valleys are roughly similar. Roughly. Yet the scale is off pretty dramatically, with Google underreporting the traffic on one of the sites by a factor of two.

So my recommendation is that to gauge large trends (seasonality, results of large offline campaigns, etc.), Google Trends is a decent first look. It's probably a safe bet that when you plot two sites within the same vertical, that their relative lines will be more or less accurate when contrasted. But don't trust it for raw numbers.

Just to be fair, Google never said it was 100% accurate, stating in the post that "because data is estimated and aggregated over a variety of sources, it may not match the other data sources you rely on for web traffic information."

Exactly How Accurate IS Google Trends for Websites?
Posted by Erik Dafforn at 02:25 PM | Comments (1) | TrackBacks (0)
Printer-friendly version

June 04, 2008

A Guide to Robots Exclusion Protocol Erik Dafforn

posted by Erik Dafforn in category: Crawling and Indexing

Google's Prashanth Koppula wrote a ready-to-bookmark post over at the official Webmaster Tools blog, showing tons of different robots-exclusion protocol (REP) directives that can be implemented in various ways. Following is a listing of directives discussed and the methods of implementation:

Directives for the robots.txt file:

  • Disallow
  • Allow
  • $ Wildcard
  • * Wildcard
  • Sitemaps location

Meta tags for insertion into HTML:

  • NOINDEX
  • NOFOLLOW
  • NOSNIPPET
  • NOARCHIVE
  • NOODP

Of special note are the two different wildcard uses; the post links to usage models for each. One additional funny bit is in the explanation of NOARCHIVE, in which the post describes the tag's usage as "Do not make available to users a copy of the page from the Search Engine cache." Contrast this with "Do not cache the page," which I believe is most people's idea of the tag's effect. I love little semantic hooks like that.

The post notes that the directives above are observed by Google, Yahoo, and MSN/Live, which is a nice bonus. In addition, the post discusses some directives that only Google honors, such as UNAVAILABLE_AFTER (which I discussed about a year ago), NOIMAGEINDEX, and NOTRANSLATE.

I appreciate what engines are doing with the REP advancements. It's the equivalent of the basic Robotstxt.org protocol being the vehicle, but the engines have become after-market accessory specialists, showing you how to get additional mileage, power, and stunts out of your car.

A Guide to Robots Exclusion Protocol
Posted by Erik Dafforn at 08:07 AM | Comments (0) | TrackBacks (0)
Printer-friendly version

May 22, 2008

Tumblr and SEO: A Case Study in Rapid Response Erik Dafforn

posted by Erik Dafforn in category: Social Media

Here's a quick case study in how social media sites (more important, the conversations going on at social media sites) are enabling companies to interact with and respond to their users.

Here's the rough chronology. I may have missed some letters in the middle, but points A and Z are pretty accurate.

  1. Melissa Chang runs a blog on her own domain, using the Tumblr platform. (For the uninitiated, Tumblr is roughly similar to Blogger or Wordpress, although many people seem to use "Tumblogs" as a middle ground between article-length posts and Twitter-like microblog posts.) She is unhappy with her search traffic and writes a post saying so.
  2. Steve Rubel reads the post and bookmarks it at Del.icio.us.
  3. Steve's bookmark shows up at FriendFeed, where he aggregates his various social media endeavors.
  4. A conversation begins at FriendFeed about whether, and to what extent, the Tumblr platform is or is not search-friendly. A somewhat lively and mostly constructive discussion takes place.
  5. Others lend various perspectives at their own blogs.
  6. Tumblr reps follow -- and join -- the FriendFeed conversation(s).
  7. Tumblr responds on its official blog, saying it has already made many of the changes that came from the discussion on FriendFeed and elsewhere.
  8. Many are happy with the changes; some are not. My personal opinion is that Tumblr may have entered the egg-breaking stage of omelet-making. The site will be better off in the long run.

So a logical question is, how is a "conversation" like the one at FriendFeed different from Tumblr users merely writing to the Tumblr staff and making the same recommendations -- which some users claim they've been doing for a while? I don't know the answer to that. But I think the interest in and productivity resulting from the FriendFeed conversation had a lot to do with it.

Back in the day, big brands used to respond to customer letters. I mean respond. Like type up a reply and send it. This is because they realized that for each person who took the time to write or type a letter, stamp it, and walk it down to the mailbox (later known as the "barrier to entry"), there must be about 10,000 people who feel exactly the same way.

Today, you can send an email as easily as you can cook a Hot Pocket. Anyone can do it. So the 10,000:1 ratio or yore is more like 1:1 today. The FriendFeed conversation shows that not only is more than one person affected, but that actual recommendations can be spat out the back end. I think that's why the response was more rapid.

Very soon, this will be the norm in customer relations, at least for progressive, consumer-focused companies.

Tumblr and SEO: A Case Study in Rapid Response
Posted by Erik Dafforn at 11:15 PM | Comments (0) | TrackBacks (0)
Printer-friendly version

April 30, 2008

Real and Imagined Errors in Google Sitemap Feeds Erik Dafforn

posted by Erik Dafforn in category: Crawling and Indexing

When you upload your XML sitemap feed to your server -- especially if it's GZipped -- don't expect it to look pretty. I got a nervous call from a client because when he called the XML feed URL in his browser, he saw this:

g-sitemap-error.jpg

While it looks like an error, it's really not. Not in the traditional sense, at least. The error here is that your browser (in this case, Firefox) isn't able to view the file without a little help -- specifically, a stylesheet that tells it how it should look to human viewers.

The bottom line is that this message doesn't mean that engines can't read your XML feed -- only that you can't see it. To see whether Google can process it, for example, check the Sitemap Summary report. For some reason, this report isn't in the main GWT left nav. To find it, you need to click the "Details" link at the far right of the Sitemap Overview report. When you click that link, here's what you see:

g-sitemap-error02.jpg

Real sitemap errors do exist, even in the example I used above. In this case, I've inadvertently included in the sitemap a URL that I also excluded via robots.txt. So I'm sending Google a mixed message there. Fortunately, the robots.txt file overrides the URL's inclusion in the sitemap, so it ends up being more of a gentle nudge than a true, crippling error. If the error doesn't specifically say that the sitemap is invalid and unreadable, then it's probably not.

Real and Imagined Errors in Google Sitemap Feeds
Posted by Erik Dafforn at 08:52 AM | Comments (0) | TrackBacks (0)
Printer-friendly version

April 23, 2008

Update on Google Showing Excluded URLs as Sitelinks Erik Dafforn

posted by Erik Dafforn in category: Crawling and Indexing

A little over a month ago, I wrote about Google showing robots-excluded URLs as Sitelinks. Here's a shot of what Google showed for the query [seo speedwagon] in mid-March:

Google showing an excluded URL as a Sitelink

The ip login link was (and is) excluded via robots.txt. A month prior (in February), a link to one of our monthly archives -- a page with the robots "noindex" meta tag -- appeared as a Sitelink also.

Since then, the SERP has been cleaned up. I use the passive voice because I don't exactly know who to thank. Either the algo picked it up on its own, or someone hand-washed it. Either way, it looks better now:

Sitelinks are all 'allowed' URLs now

I'm not sure if we're an isolated case, so if you have any examples of excluded URLs still showing up in Sitelinks, please let us know in the comments.

Update on Google Showing Excluded URLs as Sitelinks
Posted by Erik Dafforn at 08:12 AM | Comments (4) | TrackBacks (0)
Printer-friendly version

April 01, 2008

Google Serves Ads Based on Previous Queries Erik Dafforn

posted by Erik Dafforn in category: Adwords

In 2005 (as reported by Search Engine Journal), Google applied for a patent called "Results based personalization of advertisements in a search engine." Part of the patent abstract reads as follows:

The search results are personalized based on a user profile of the user providing the query. The user profile describes interests of the user, and can be derived from a variety of sources, including prior search queries, prior search results, expressed interests, demographic, geographic, psychographic, and activity information.

Until now, I hadn't seen any instances of Adwords being served based on prior queries in the same session. (This doesn't mean it hasn't happened -- only that I haven't seen it.) But recently I've begun to notice it when signed in to my Google account. Each time I've noticed it (it's been hard to reproduce) it typically occurs after several searches for one particular topic, followed by a sudden shift to a query for another topic. For example, here is one recent search pattern:

[laptops]
[laptop repair]
[laptop parts]
[trucks]

Here is the resulting SERP for the [trucks] query. I've compressed the page so you can see both organic and paid results:

adw-prior-query-01.jpg

Here is the query set for the second example:

[gloves]
[work gloves]
[gardening gloves]
[jersey gloves]
[heavy duty gloves]
[wheelbarrows]

And here are the organic/paid results for [wheelbarrow]:

adw-prior-query-02.jpg

The second example is admittedly less convincing, because it's plausible that glove retailers could purchase bids for "wheelbarrow" terms. But I was unable to see any "glove" ads in subsequent searches for "wheelbarrow" terms.

This is interesting because query results like this allow the ad to really stick out contextually and give the advertiser the whole stage, so to speak, for a certain term. And even though the user has changed gears and is searching for something new, the "old" vein of queries is certainly still in his or her mind. I would love any feedback about how widespread these results are, CTR data for "residual" query ads, etc.

Google Serves Ads Based on Previous Queries
Posted by Erik Dafforn at 07:39 AM | Comments (2) | TrackBacks (0)
Printer-friendly version

March 13, 2008

Google Showing Robots-Excluded Links in Sitelinks Erik Dafforn

posted by Erik Dafforn in category: Google

You might have noticed that Google rolled out sitelinks for a new batch of sites a couple weeks ago. This blog was included in that batch, as you can see if you do a query for [seo speedwagon].

The goal here isn't to beat up on Google, but I think it's significant enough that site owners should be aware of it. In a couple cases, the sitelinks that Google shows (or showed) for our site have been links specifically excluded from robots, either via robots.txt or by the "noindex" attribute in the robots Meta tag. Following is a screen shot of the [seo speedwagon] query taken on February 26, which is roughly when the new batch of sites started noticing their sitelinks:

speedwagon-sitelinks-02-08.jpg

Note the two red-outlined links. The one in the left column, ip login, is our staff login page. It's been excluded by our robots.txt file for almost three years. Coincidentally, Google couldn't index that page if it wanted to, as it's password-protected. I know that robots.txt exclusion isn't a totally reliable way to keep a URL from showing up in SERPs, as it often causes what's known as a "partially-indexed" URL (example). But come on -- a Sitelink?

The outlined link in the right column (November 2007) is a typical (if capriciously chosen) monthly archive page -- exactly the kind you see in the third column of this blog. They're ugly, more or less useless (both for SEO and for people), and I'll probably eventually do away with them, but for now, there they are. But the important thing here is that I added the robots "noindex" tag to them well over a year ago.

Just this week, Google changed the format slightly. Here's a current shot:

speedwagon-sitelinks-03-08.jpg

The November 2007 link (excluded via Meta tag) is now off the list (automatically -- I didn't do it), but the ip login link remains.

Yes, I know I could block specific sitelinks from within Webmaster Tools. And I might, but I wanted to show it to you first.

It seems like excluding specific URLs via robots.txt or via the robots meta tag should be a sufficient method of opting URLs out of sitelinks.

This topic is especially timely as Matt Cutts just recently asked users how they'd prefer that a meta-tag-excluded URL appear -- if at all -- in the Google index. As of this writing, 83% say "Don't show a link at all." I don't want to speak for his readership, let alone all site owners, but I can confidently predict that most people don't want a robots-excluded URL (regardless of whether the exclusion mechanism was robots.txt or a robots "noindex" Meta tag) showing up in a Sitelink.

Google Showing Robots-Excluded Links in Sitelinks
Posted by Erik Dafforn at 10:33 PM | Comments (1) | TrackBacks (0)
Printer-friendly version

March 04, 2008

NYT Traffic Doubles, Revenue Grows Since Killing Subscriptions Erik Dafforn

posted by Erik Dafforn in category: Old Media

John wrote a few times last fall about the NY Times tearing down its paid subscription wall and allowing spiders in.

Now, in an interview at The Deal, Google's David Eun (on p. 5) confirms that it was a good idea:

We have some partners that have made very bold steps, such as The New York Times, which went from a pay model to a free model. After they went free, the traffic they got from us alone doubled. Their math says they make more money by offering content free to consumers, but stimulating demand and making it work with advertising. The Financial Times did the same thing, and at least early on in the process they experienced at least a 100% growth in traffic.

Don't hold your breath waiting for further breakdown of the math, especially for the NYT example. Note that while Eun says traffic doubled, he was less specific about the money, saying only that "they make more" under the current scenario.

It should be no surprise that it's Google -- not the Times -- telling us the good news about expanded indexation. After all, Google has more to gain from all of us knowing about it, because it now gets a slice of the pie:

NYT Adwords premium ad

Thanks to BeetTV via SearchCap.

NYT Traffic Doubles, Revenue Grows Since Killing Subscriptions
Posted by Erik Dafforn at 10:51 AM | Comments (0) | TrackBacks (0)
Printer-friendly version

February 28, 2008

Heirs Still Fighting Over the Page View Estate Erik Dafforn

posted by Erik Dafforn in category: Web Analytics

Good article in Computerworld this week called Life After Page Views: Web Analytics 2.0.

To sum up, the page view has been tossed into the Pythonesque "bring out your dead" cart by a lot of people, including me, in an article I wrote at ClickZ a year ago:

Page views have long been one of the Web's most reliable measurements. But because of technologies like AJAX, Flash, and RSS, a site can perform at engines better than ever and users can spend as much (or more) time on your site than ever before, but the page view count won't reflect it. Page views rely on Web 1.0's click-and-wait model. ...
Sites with an income model that relies on excellent search engine positioning and subsequent page views must be especially diligent in showing potential advertisers a true picture of the site's user experience. Whether it's shifting the influence of time spent on a site, adding script-based click tracking to internal AJAX applications, or something entirely different, a multifaceted approach to Web measurement is becoming more and more important for Web monetization.

So imagine how vindicated I felt when, last July, Nielsen / NetRatings decided to abandon the page view as the primary web analytics metric. From the CW article:

At the time, the Internet benchmarking firm cited the growing popularity of Asynchronous JavaScript and XML, or AJAX -- which can refresh content without completely reloading a Web page -- as the main reason for the change to measuring time spent on a site.

But it turns out that video, not AJAX widgetry, is the major culprit in the growing chasm between falling page views and climbing "time spent" online. All of which leaves us with the same question: How do we measure consumer engagement in a post-page-view web publishing landscape?

The article is a little too long to sum up quickly, so I do recommend the read. The basic issue is that companies like Nuconomy are trying to be the first out of the gates with new engagement-measuring metrics such as "comments added to blogs, ratings, applications shared with friends, clicks on ads and online video use -- all of which can show how 'engaged' a user is with a particular brand or product," while folks like Avinash Kaushik (Google Analytics guru and recent SEMMY winner) caution us against rushing out and arbitrarily defining concepts while totally abandoning concrete measurements.

"I am not saying don't create engaging experiences," he added. "[Just] don't use the term engagement, because it has been bastardized to the point that it doesn't mean anything."

More questions than answers, certainly, but that's not necessarily bad.

Heirs Still Fighting Over the Page View Estate
Posted by Erik Dafforn at 10:30 PM | Comments (0) | TrackBacks (0)
Printer-friendly version

February 18, 2008

MSN's Berkowitz Pulled from the Index Erik Dafforn

posted by Erik Dafforn in category: MSN

I haven't seen this anywhere except ClickZ and I thought you might be interested. As of last Thursday, Steve Berkowitz, the SVP of Microsoft's Online Services Group, is out. He'll be staying through August "to ensure a smooth transition."

In the big picture, two years doesn't seem like quite enough time to have turned the MSN Search ocean liner around, despite the fact that Berkowitz is credited with Ask's financial turnaround during his tenure there. But someone has to fall on the sword in situations like this, and it looks like he was the logical choice. One wonders whether a simple management shuffle will have a significant effect, or whether it's merely bringing a sharper knife to the gunfight.

Further reading:

MSN's Berkowitz Pulled from the Index
Posted by Erik Dafforn at 04:06 PM | Comments (1) | TrackBacks (0)
Printer-friendly version

February 08, 2008

Predictive Search Merges into Consumer Apps Erik Dafforn

posted by Erik Dafforn in category: Tools

This isn't breaking news, but in their recent versions, both Netflix and iTunes have integrated some very smart internal search utilities into their systems.

They're using a type of search function that goes by several different names, including "predictive," "intuitive," or "suggestive" to offer users additional help during the internal search process. Here's an example of Netflix's system in action:

Netflix internal search

The Netflix system appears to list terms in alphabetical order, and the search term itself is always first in the list of suggestions. This is good intuitive search, but it's not as good as iTunes' method:

iTunes internal search
iTunes uses a pretty sophisticated algorithm that appears to rank by popularity (instead of alphabetical order) and perhaps more important, inserts the typed term anywhere in the query that makes sense -- not just as the first term in the string.

Why does predictive search matter? Because when users select the right artist, song, film, whatever -- that's a conversion. These intuitive search features shorten the click path between a user wanting something and getting something. Compare these two potential search paths:

Without intuitive search:

  1. User types terms at a search box
  2. User clicks "submit"
  3. Site (or app) returns search result
  4. User scans search results page
  5. User clicks result that matches his/her query
  6. Site (or app) delivers correct page

With intuitive search:

  1. User types terms at a search box
  2. Site (or app) displays potential queries immediately
  3. User clicks term from dropdown suggestion box
  4. Site (or app) delivers correct page

A click path is like plumbing with loose joints. The more twists, turns, and connections, the more cargo (visitors) you lose due to leakage. In the cases above, the addition of intuitive search reduces the plumbing overhead by a third.

Coincidentally, the respective features of Netflix and iTunes parallel those of Google Suggest and Yahoo Search Suggest, which I wrote about a few months ago.

Predictive Search Merges into Consumer Apps
Posted by Erik Dafforn at 07:02 AM | Comments (0) | TrackBacks (0)
Printer-friendly version

January 28, 2008

Keyword Research as a Predictor of Sales Erik Dafforn

posted by Erik Dafforn in category: Keywords

Here's a short but important note about relying on keyword demand to predict general industry sales trends or successes.

Sometimes, keyword demand is a fairly accurate reflector (or predictor) of interest and/or sales:

Search demand for Wii, Xbox, PS2 and PS3 in 2007

Stats: According to this game industry blog, here are the respective console sales for 2007:
Wii: 6.29M units
Xbox: 4.29M units
PS2: 3.97M units
PS3: 2.56M units

And sometimes it is not:

gt-hddvd-v-bluray-2007.jpg

While this chart might correspond roughly to the sales of player units (578,000 HD DVD and 370,000 Blu-ray machines will be sold by the end of [2007]"), one would be advised against picking a format "winner" from this chart (see this or many other articles like it). Most of the technorati (small "t") realize that a PS3 console also comes with a built-in Blu-ray player, so those searching for [blu-ray] are only a fraction of those searching for Blu-ray. If that makes sense.

Disc sales tell a story different from the sales of hardware units. PC World says "Blu-ray Disc movie titles outsold HD DVD in the United States by a nearly 2-to1 margin last year, according to sales figures from Home Media Research."

Using trending charts to estimate sheer search volume is a pretty sure bet. But be careful about drawing conclusions about popularity and intent out of those raw numbers.

Keyword Research as a Predictor of Sales
Posted by Erik Dafforn at 12:00 PM | Comments (2) | TrackBacks (0)
Printer-friendly version

January 23, 2008

Duplicate Content - Thinking Inside the 'Big Box' Stores Erik Dafforn

posted by Erik Dafforn in category: Crawling and Indexing

SEOs (myself included) love to preach the Organic Gospel as if knowledge is the barrier, not implementation. "If people only knew this or that," we say, "they'd be saved."

But a lot of times, knowing the right stuff only leads to the next barrier, which is, "How the heck to we DO it?" Various facets of site maintenance -- user experience and tracking, to name only two -- frequently compete with SEO techniques for front-burner attention.

As an example, I want to look at Circuit City's site and show a couple examples of things they're doing that aren't optimal from an organic SEO perspective, but that are probably necessary for other reasons. (I should note that I LOVE the CC site from a user's perspective. They -- and a lot of the Big Box stores, to be fair -- have a great way to narrow and expand the choices to help you find exactly the product(s) you're looking for.)

The first example, however, is one I'm fairly critical of, because I feel like whatever benefit they're gaining probably isn't worth it. The CC home page has two links to the main "TV & Home Entertainment" page -- one in the top nav, and one in the lower set of links. You can see the links in the following screen shot, and I've listed them afterward:

Two links to the same content, but two different URLs

Here are the respective links. I've bolded the points at which the dynamic strings diverge:

http://www.circuitcity.com/ccd/categorySpecial.do?catOid=-12866&N=20012866&c=1

http://www.circuitcity.com/ccd/categorySpecial.do?catOid=-12866&N=20012866
&SESSIONLINK&cm_re=011308%20HOME%20PAGE%20A-_-navboxes%20
TV-_-TV%20and%20Home%20Entertainment

The content of the pages is identical, so you know where this is going. They're diluting the potential power of each by having a similarly named mirror.

Let's look at a more complicated example. Consider two more URLs on the Circuit City site, and I'll contrast the query strings and the page contents. I've stripped out the "http://www.circuitcity.com" portion of the URL for brevity, but I've linked to them so you can see for yourself if you want. In addition, after each URL, I've shown the breadcrumb navigation from each page so you can see the subtle difference.

Page 1: TVs that cost $500-$999 that are Sony:
URL: /ssm/Televisions/sem/rpsm/c/1/catOid/-12867/Ns/net_price
|0||accm_grs_mgn_dllr|1/link/ref/N/20012866+20012867+312867003+40000229
/link/ref/rpem/ccd/categorylist.do

Breadcrumb trail:
circuit-city-breadcrumb-1.jpg

Page 2: TVs that are Sony that cost $500-$999:
URL: /ssm/Televisions/sem/rpsm/c/1/catOid/-12867/Ns/net_price
|0||accm_grs_mgn_dllr|1/link/ref/N/20012866+20012867+40000229+312867003
/link/ref/rpem/ccd/categorylist.do

Breadcrumb trail:
circuit-city-breadcrumb-2.jpg

The on-page content from these two URLs is identical. After all, no matter in what order you query the database, it should theoretically produce the same products (in this case, three specific TVs). But notice (in bold) how the order of two parameters is swapped in the two URLs, in effect causing a duplication. The content (and breadcrumb navigation) is generated based on the order in which the user selects search criteria. This makes a fantastic user experience -- no doubt about it. But it's hurting them subtly because engines either crawl too many pages and dilute each one's unique potential for ranking well, or, more likely, the bots hit a nav scheme like this, turn a few corners and crawl a handful of pages, then bail because they can recognize what a sinkhole it is.

About a month ago, a WebmasterWorld thread ($upport required) discussed a topic similar to this. Member PageOneResults discussed a client's site, which offers multiple paths and entry points to specific product URLs, with the final product URL varying based on the entry point used and the path taken to that product. Following is a response to his original post, followed by his reaction:

>>If the URL depends on the route taken through the site, then you have a major problem to figure out and fix.

Yes, we have a major challenge ahead of us in regards to the one example provided where there were 10 access points for one product. That takes into consideration 5 under www and 5 under non www which is what is happening.

I'm all for as many access points as can be possibly provided as to not hamper the visitor experience. And, as pointed out, as long as that product leads to the same URI from all access points, life is sweet. But, that is not the case...

One important note is that PageOneResults is aka Edward Lewis, who runs SEO Consultants (of which Intrapromote is a proud member). Edward has probably forgotten more about SEO in the last 12 hours than I have ever known, so when he asks for input, it's not due to lack of knowledge. The bottom line is, this stuff can get extremely complicated regardless of your SEO knowledge level.

Duplicate Content - Thinking Inside the 'Big Box' Stores
Posted by Erik Dafforn at 11:35 AM | Comments (1) | TrackBacks (0)
Printer-friendly version

January 07, 2008

All Eyes on Wikia Search Launch Erik Dafforn

posted by Erik Dafforn in category: SEO Industry News

After more than a year since the initial news, Wikia Search officially launched this morning. I won't bore you with the reviews, which are mixed (although seldom neutral).

Probably the funniest line came from Matt Cutts, whose

...reaction is pretty simple: congrats to the Wikia crew on your public launch, and welcome to the search industry! I’m glad that you’re jumping into the search space.

This seems a little like Tom Brady welcoming his grandmother to the pickup scrimmage at the family reunion.

All Eyes on Wikia Search Launch
Posted by Erik Dafforn at 07:46 AM | Comments (0) | TrackBacks (0)
Printer-friendly version

December 19, 2007

Supplemental Index, We Hardly Knew Ye Erik Dafforn

posted by Erik Dafforn in category: Crawling and Indexing

The Google Webmaster Central Blog put another nail (the final one?) in the coffin of Google's infamous Supplemental Index just now by declaring it fully immersed into the main index:

We improved the crawl frequency and decoupled it from which index a document was stored in, and once these "supplementalization effects" were gone, the "supplemental result" tag itself -- which only served to suggest that otherwise good documents were somehow suspect -- was eliminated a few months ago. Now we're coming to the next major milestone in the elimination of the artificial difference between indices: rather than searching some part of our index in more depth for obscure queries, we're now searching the whole index for every query.

This is, in my opinion, much more significant than the prior act of simply removing the "Supplemental Index" label. The main problem has never been the label applied (or not applied) to URLs, but the fact (or at least the fear) that SI pages were being given short shrift in their efforts to contend for queries. So what's the intended result?

From a user perspective, this means that you'll be seeing more relevant documents and a much deeper slice of the web, especially for non-English queries. For webmasters, this means that good-quality pages that were less visible in our index are more likely to come up for queries.

Of course the onus doesn't fall entirely on Google here. SI pages were SI for a reason. If you think they're worth ranking for, the old rules still apply. Make sure you remove any obstacles to crawling and indexing that may remain, and try to get some additional links -- internal and external -- pointing to them.

Supplemental Index, We Hardly Knew Ye
Posted by Erik Dafforn at 12:29 PM | Comments (2) | TrackBacks (0)
Printer-friendly version

December 14, 2007

Big Update at Google Analytics Erik Dafforn

posted by Erik Dafforn in category: Web Analytics

Late yesterday, the Google Analytics team announced a major update to its free analytics package.

Taking full advantage of the upgrade requires something that I'm sure that the GA team wishes didn't have to happen -- the modification of the tracking codes on every page of your site. Basically, you'll need to change the small snippet of code that used to refer to urchin.js so that it now will reference ga.js -- Google's new JavaScript tracking file.

But not to worry. The team has assembled a 22-page Tracking Code Migration Guide (PDF) designed to, um, walk you through the process.

Beyond a simply explaining how to update your code (which shouldn't be a problem if you input the original code in the first place), the guide explains the benefits of the new system by showing additional features, such as:

  • Tracking virtual page views
  • Tracking downloaded files
  • Tracking a page in multiple accounts
  • Tracking subdomains
  • Track a visitor across domains using a link
  • Track a visitor across domains using a form
  • E-commerce transactions
  • Adding organic sources
  • Segmenting visitor types
  • Restrict cookie data to a subdirectory
  • Control data collection settings
  • Control session timeout
  • Control campaign conversion timeout
  • Custom campaign fields
  • Using the anchor (#) with campaign data
  • Setting keyword ignore preferences
  • Control the data sampling rate

Some of these features already exist in one form or the other. For example, you can track file downloads by defining one of your conversions as such. But the new iteration promises more simplicity, which is never a bad thing.

Remember, as always, this is a beta release. (But you knew that, didn't you?) I haven't updated the code on our sites yet, so I can't vouch for any particular improvements. But I am eager to get into it and will certainly post any interesting tidbits right here.