How Many Hits Does Your Site REALLY Get?

Its been two years since my inaugural blog post on April 29th, 2006: The Trouble With RSS. Over my site's second year, I wanted to do some long-term analysis on how different web analytics tools track hits, visits, and the like. As expected, they don't agree with each other:

  • SiteMeter: 89,800 visits (132,000 hits)
  • Google Analytics: 84,000 visits (140,000 hits)
  • Webalizer: 431,000 visits (3,660,000 hits)

Curious about why web site statistics differ based on the tool? SiteMeter uses an embedded image (at the bottom of this page), and tracks a hit every time somebody loads the image... so if you block banner ads, your visit might not be recorded. Google Analytics loads some JavaScript, which is useful for tracking more complete data... but if your browser blocks JavaScript (or cross-domain JavaScript), it wont register a hit. I found it odd that SiteMeter tracked more visits, but fewer hits than Google Analytics... curious.

In contrast with the other two, Webalizer uses raw Apache logs to determine hit count, so it tracks every single dang hit... Over 3 million hits in one year??? That's clearly too many... I'm not that interesting... but the visit count might be more accurate. Webalizer is the only analytics tool that tracks folks who view my site with RSS Readers, which may hit my site several times per day... thus the higher visit count. The hit count is hyper inflated because it counts search engine spiders, spammers, and hack attempts (some better than others).

All told, if the majority of folks view my site with RSS, then Webalizer's count is more accurate. If most of them view it the old fashioned way, then the other two are more accurate. I'm probably in the 100,000 - 200,000 visits per year range.

Unfortunately, none of these numbers include the folks who read my site through an online RSS readers, like Google Reader, or Bloglines. These sites hit my RSS feed once, then share it with dozens of folks who subscribe to the feed... To get a better estimate, I could pipe my RSS Feed through something like Feedburner. Feedburner keeps track of how many subscribers you have on the online feed readers, and produces decent stats on it... however, once you move your feed to Feedburner, its almost impossible to move it out... so I'm not happy with that option. Even so, that still wouldn't track those who view my content through RSS aggregators like Central Standard Tech, or Orana, or other sites that run Planet.

Well, what about the data from Alexa? That site ranks web pages based on those who surf the web with a toolbar that tracks their every move. Personally, I think people who surf with that toolbar are opening up a major security hole... so their viewing audience is probably restricted to folks who are kind of tech savvy, but don't take security precautions. In other words, newbie geeks. I've never broken into the top 100,000 sites ranked on Alexa, but I frequently break the top 100,000 sites ranked by Technorati... although Technorati only ranks blogs.

UPDATE: As Phil noted in the comments below, most people use Alexa just to boost their own page rank. For example, you could have your web team install and enable the Alexa toolbar, but only when browsing you own web page. That would make your Alexa rank huge without any actual hits from the greater internet...

Even if we could accurately count how many people hit the site, we're still at a loss to know who paid attention. Google Analytics tries to measure "time on the page", other metrics include bounce rate, or even the number of comments.

Oh well... A reliable measure of relevance will always be elusive... but at least we have enough estimates to support a cottage industry of people analyzing those metrics to prove anything they are told to prove ;-).

Back to my anniversary... Lots of stuff has changed since my first anniversary post: I've traveled to South Africa, Brazil, and Argentina... I've remodeled my kitchen, I've nearly completed my second book on Oracle enterprise content management, I've given technology presentations at Oracle Open World, AIIM Minnesota, BarCamp Minnesota, and IOUG Collaborate in Denver. I've trained both salespeople and consultants on what Enterprise Content Management actually is, and I helped negotiate a settlement to an 18-month lawsuit against a local non-profit. Oh yeah... I implemented about a dozen ECM solutions as well...

Next year, I hope to have even more goin' on... and a few more web site visits.

Congratulation

You sure will get more and more popular. When is your next book going to be published? I am looking forward to it.

Cheers

Thanks

Hey,
i was doing a research on Site Hits. Thanks for the article..helped a lot

no prob...

@Kent, thanks for the compliments... my next book should be available in January, I believe. I'm nearly done, but then begins the editing process ;-)

@Visitor, yeah, a lot of people just turn on Apache log monitoring, and they have a "woo hoo!" party when they get a million hits... but they don't look at the detail to see that they probably only get 1/10th of that traffic. You need much better analytics... Google is nice, when it works.

its more relative than absolute

Bex,

the relevance is in the ability to compare the statistics over time. Having the ability to track that certain content is more popular than other helps you optimize the site. Bounce rate and time actually spent on the site are obviously interesting metrics in that regards as well.

What is important to take away from your article is that your site statistics can if used to compare between different sites are only relevant if the same methods are used to measure them.

Cheers
- Boris

Hits

Your apache stats count different things, so a page of text with 5 images would count as 6 hits - If you think about how a browser works it pulls the html, reads it then discovers what it needs to go back for like images etc.
Also, if a visitor hits your site, leaves then comes back 5 mins later, is that a separate visit or part of the same visit?, different tools count these differently.
A lot of the people who use Alexa use it to boost their own site stats, like Jason Calacanis makes all the mahalo admins use it to make mahalo look like it's really popular.

helpful

Yes the article was great.

the relevance is in the

the relevance is in the ability to compare the statistics over time. Having the ability to track that certain content is more popular than other helps you optimize the site. Bounce rate and time actually spent on the site are obviously interesting metrics in that regards as well.

What is important to take away from your article is that your site statistics can if used to compare between different sites are only relevant if the same methods are used to measure them.

Cheers

Thanks

Thanks! Helpful article. =)

The article is awesome!!

The article is awesome!! It's very useful for me and help me a lot!!

I think webalizer is counting

I think webalizer is counting all the robot visits from the search engines. You know those that spider your website daily to check all the changes and that drives the traffic numbers so high. From the deep stats from webalizer you could see what kind of traffic loads and what caused that huge difference for the other analytics program. But as you said the best way is to study the user behavior on the site and not the number of users.

If users leave your site in a second it doesn't matter much how many visitors you can get.

The problem with RSS

I have been searching for the same question you've posed here: How do you accurately measure traffic from RSS subscriptions/subscribers. While hit counts are only one measure of effectiveness, they are nonetheless important, even if only from a author's perspective: how do you know which posts readers found most helpful/interesting?

I recently wrote two posts on my blog about how to improve site traffic metrics using WordPress. The first one explains how to tweak the RSS settings and use excerpts while the second one explains how to improve home page layout to drive people into specific posts.

I'd love to know what you think about these techniques for improving site data!

Recent comments