Statistical Significance Chrome Extension For Google Analytics
In March I wrote a script for the statistical evaluation of time-frame comparisons in Google Analytics. The idea seemed well received, but who wants to have to hit F12, open their developer console, and then come back to my blog post for the code… every time you want to run the script?
So, I converted the script into a Chrome Extension (click below)!
This Chrome Extension also extends the original functionality of the script, by utilizing Student’s t-test for comparisons with more than 40 data points. Additionally, there were bugs in the first edition that limited the script’s application to certain metrics (E-commerce revenue, bounce rate, etc). This new version should work with all graphable metrics.
As I suggested in my original post, it’s important to incorporate statistical significance into our interpretation of Google Analytics data. Barring at least some consideration of the statistics behind our data analyses, we risk identifying false positives as meaningful factors, and incorrectly allocating resources. If you need a quick brush-up on the basics and purpose of statistical testing, I recommend taking a look at Statistical Significance in a Testing World by Adam Sugano (check out his original article as well).
While my chrome extension aims to extend the strengths of Google Analytics (namely, responsive ad-hoc queries and visual comparison), analysts wishing to conduct a more-detailed statistical analysis should turn to the R Google Analytics project. If you have never used R, but you paid a bit of attention in your college stats class, and are familiar with a scripting language, you should be able to jump right in.
Matt Clarke at TechPad provides a great tutorial on connecting your Google Analytics account with R.
Getting back to this Chrome extension, there are several examples of when we would make temporal (time-frame) comparisons of metrics in Google Analytics. In each of the following examples, we should be comparing engagement metrics before and after the event. We should also ensure that any percentage changes reported in Google Analytics are statistically significant.
- Website updates have been pushed
- Website has experienced change in exposure. Mention by a large news source, etc.
- New marketing campaigns
- Concerns that a search algorithm change is affecting traffic
- Concerns that a change in traffic demographics is affecting conversions
Extensions / Feedback
Now the most important part of this post. What do you think about the approach to incorporate more statistical consideration into our Google Analytics analysis? Worthwhile? Waste of time?
And what do you think about the current state of the Chrome extension? What types of improvements would you most like to see? Some potential improvements might include:
- Support for comparison of advanced segments (pairwise / aggregate)
- More-detailed output (average, median, average difference, per-day)
- Support for exporting graph data points/calculated statistics as a CSV
Note: n is the number of data points (the number of hours, days, weeks, or months in your timeline). If you are looking at a full year’s worth of data at monthly points, n = 12.
- For 6 < n <= 40, Wilcoxon Signed-Rank test is used. The exact p-value is not returned, only tested at 0.1, 0.05, and 0.01 levels.
- For 40 < n < 58, Student’s Paired t-test is used. The exact p-value is returned.
- Updated to work with all graphable metrics
- Warns if start date is not the same weekday for the two time-frames to be compared
- Will not run if the time-frames are different lengths
We make several assumptions about the data’s distribution in order to use the Wilcoxon Signed-Rank test and the Student’s Paired t-test. We do not estimate that these assumptions are unreasonable for most Google Analytics data.