Victor's Blog about the Web, Security and Life

The web for me is a hobby where standards and best practices are daily bread. Security is a concern that everybody must be aware of its details for IT in general, and the web in particular, to be a safer place. My life, on the other hand, is that of a regular Lebanese citizen where politics and social issues are discussed on a daily basis. I hope you enjoy reading my blog and make sure to drop me a comment about any topic you find interesting.

Google Analytics vs Log File Based Statistics

Printable Version

victor | 19 April, 2010 10:46

The way Google Analytics works is very different from the way log file based statistics work. Log file based statistics follows a very clear process. It opens the log file at the server side, parses its contents and generates results based on that content. Google Analytics, on the other hand, uses JavaScript technology to create a cookie within the user browser and uses that cookie (and JavaScript) to track the user's behavior on the website.

Although both tools provide VISITS statistics, Google Analytics tends to be more accurate due to the fact that cookies allow for differentiating between two users on the same virtual network while IP-based differentiation (the technique used by Log File Based Statistics) fails to work.

There are many differences (when it comes to accuracy) between the two tools. The first tool, Log File Based Statistics, is intended for accuracy in number of hits, pages, bandwidth consumption, server load, etc. The second tool, Google Analytics, is intended for user behavior tracking, number of visitors, entry pages, exit pages, landing pages, etc.

The above general description is very crucial for website administrators to understand where each tool is more effective and to also highlight the fact that using one tool does not eliminate the need to the other. Both tools are still needed to achieve all results properly (and effectively). If your concern is only statistical (hits, pages, consumption, etc.), Log File Based tools are the key. If your concern is marketing-oriented (i.e. visitors, behaviors, etc.), then Google Analytics is king.

Why Are Numbers of Google Analytics Lower Than Those of Normal Log File Based Statistics?

The following list is not an exhaustive one but is intended to list the many cases where numbers will look different:
1- JavaScript and Cookies: Google Analytics relies on JavaScript and Cookies. As such, all user agents that do not support these two will not be counted. An example of user agents that do not support these two are: PDAs, a large sector of hand-held devices, a large portion of mobile phones with default settings, referrals (websites that refer to portions of your website like an image, video, etc.), computers with hardened security settings on their PC, search engine bots, etc.

2- Cache Engines: Some cache engines serve pages to visitors from their internal cache engine without referring back to the original website. In this case, the cache engine will send what is known as a HEAD request to the server to check whether the page has been modified or not since last time it was fetched. The HEAD request is counted within LOG FILE BASED STATISTICS but is not counted within Google Analytics since some cache engines only request updates to main information files (like PHP) but not for each page entry (like JavaScript files or CSS files)

3- Visits Calculation Algorithm: the to tools are very different in terms of identifying a visit. Log file based statistics attaches a certain IP address within a certain amount of activity time to a certain visit. In other words, if an IP address is active within the log file (fetching content) for 20 minutes and then inactive for 30 minutes, and then active again; Log File Based Statistics might consider this activity as 2 distinct visits from the same IP address. Google Analytics, on the other hand, uses the cookie to track the visit and, thus, can be more relaxed in terms of the time for the visit to timeout. Some echoes are that Google Analytics might consider any activity within the next 3 hours as valid for the same current visit. Thus, don't expect the number of visits to be any similar at all.

Related Articles:
Add comment
Accessible and Valid XHTML 1.0 Strict and CSS