Understanding Scope In Google Analytics Reporting
Google Analytics makes it easy to collect and report on data, often without understanding the underlying organization of that data. Often, complicated questions or seemingly odd behavior can be explained by understanding the model that Google Analytics uses to collect and store data. Let’s start at the beginning:
Data collected in Google Analytics can fall into one of two categories: dimensions or metrics. Jim explained the difference between them in this blog post.
However, not every dimension-metric combination can be analyzed in standard reports within Google Analytics. For example, the All Pages report gives you details about different pages on your site. You can see how many Pageviews, Unique Pageviews, and Entrances occur with respect to each page along with the Average Time on Page, Bounce Rate, % Exit, and Page Value. But you don’t see metrics like number of Users or Sessions.
You use GA because you want to know as much information about your website’s performance as possible – so why are these additional metrics not available out-of-the-box? This might seem like a limitation of standard reports, but the omission of these metrics is deliberate and understanding why they’re left out is key to creating meaningful and accurate reports from your data.
What is Scope?
These dimensions and metrics are kept separate from one another due to the way data is defined and collected in Google Analytics. Scope is a characteristic of each dimension and metric and each dimension or metric can only have one scope. GA data is organized into 4 scope-types:
- User data
- Session data
- Hit data
- Product data (ecommerce)
A hit is defined as any single action on a website such as a pageview or an event triggered by watching a video or downloading a pdf. Hits can also have products associated with them.
A session is defined as one or more hits within a certain time frame. One person’s actions on your site during a single browsing session, the pages they load, the files they download, are all connected into one session.
A user is the highest level of data collected, and is the crucial piece that connects previous and future web behavior. Specifically, Google Analytics stores a client ID for each user that visits your site, and then joins together all of the sessions with the same client ID. The client ID is a value that is generated from the Google Analytics tracking code and stored in a cookie on a person’s browser on their computer. This clarification is important; it’s important to understand that users are cookies, not individual people – the cookie is tied to the browser, not the person. So a person visiting a website on Chrome at work and then visiting the same website from Firefox on their home computer would be counted as two users, one cookie from each browser.
Users are made up of one or more sessions, sessions are made up of one or more hits, and hits can have one or more products associated with them.
Any dimension or metric with a user-level scope relates to an aspect of a user. Some of the most commonly used include:
|Session Count||New Users|
Session-level dimensions and metrics describe attributes of a single session. They include:
|Campaign||Average session duration|
Hit-level dimensions and metrics refer to features of a single hit. They include
|Hostname||Time on Page|
|Event Category||Total Events|
Why is Scope Important?
As I said above, users are made up of one or more sessions, sessions are made up of one or more hits, and hits can have one or more products associated with them. It is important to recognize that this hierarchy generally works in one direction. For example, sessions have hits but hits can’t have sessions and hits can have products but products can’t have hits. Hits do have users, however – in order to understand why this is allowed but hits can’t have sessions, we need to break down how GA processes data.
Hits are the building-blocks of GA. A client ID (mentioned above) is attributed to every hit along with a ton of other information such as what type of hit is being sent (pageview, event, etc.), the time the hit was made, etc. Hits are the pieces of information that come into GA in order to be processed. Some hits are kept, and some are thrown away, based on the filters you have set up for your View. Hits are organized chronologically based on their timestamps and connected together based on the client ID. Only then does GA figure out to which session they belong by looking at the time in between the hits.
As a result of this process, hits do not have a session ID attributed to them, so while we can say, “hits have users” because hits contain client ID information, we cannot say, “hits have sessions” because they do not contain session information. And if there was a way to link hits to sessions, our hit-level reports would likely contain duplication. So hit to session reporting isn’t compatible and therefore we say, “hits cannot have sessions.”
You should take scope into consideration while reporting with your data and while setting up custom dimensions and metrics.
This is especially relevant when it comes to reporting. As I said above, GA’s built-in reports don’t let you make invalid dimension-metric combinations. However, if you’re creating custom reports or reports with the API, there are less restrictions – you can combine almost any dimension or metric you want, which is not necessarily a good thing.
The first thing you need to ask when creating a report is, “Does this dimension-metric combination make sense with regard to the way Google Analytics collects data?” Otherwise, you might create some reports that don’t mean what you would expect them to mean.
For example, if you try to combine Page with Sessions, the resulting table does not show you how many sessions in which that page was viewed. Instead, the report shows you how many sessions began on each page. In order to see the information you’re looking for, you need to combine Page with another hit-level dimension, like Unique Pageviews.
When combining any hit-level dimension with a session-level metric, the metric will only contain data from the first hit of the session. Becky explains this misunderstanding more in-depth here along with other mistakes you can make while reporting.
Similarly, you can’t combine:
- Events and Goal Completions
- You might want to see how many events were counted as goal completions. A goal can be defined as an event, but it cannot relate to goal completions because goals are session-level while events are hit-level.
- Products and Total Events
- You might be interested in how many people who bought a product also completed an event. Comparing product-level per hit-level does not work in this case; this combination will result in a table with no data.
- Pages and Goal Completions
- Another piece of information you might be interested in are pages on which Goal Completions occurred. You can see some level of detail with the dimension, Goal Completion URL, but you cannot combine the Page dimension with Goal Completions. The reason this analysis will not work relates back to Events and Goal Completions above – goals are defined at a session-level while pages are hit-level. Instead, you could use sequence segments for conversions to analyze this type of data.
Overall, the combinations of these dimensions and metrics produce tables that don’t make sense with regard to the way GA defines and processes data.
Custom Dimensions and Metrics
Scope is extremely important to keep in mind when setting up custom dimensions and custom metrics. You get the opportunity to decide how to scope these dimensions and metrics.
Base your decision on both the data you expect it to collect and how you ultimately want to report on this data. Does this information relate to just the current hit, does it give us more information about the browsing session, or does it tell us something about that user that we want to remember forever?
Amanda discusses the set-up and use of custom dimensions in this post and Dorcas explains how to report on custom dimensions here.
There Are Exceptions
As I’ve said, combining dimensions and metrics across scope can be extremely misleading. This can create reports that either make no sense at all or, at the very least, mean something different from what you would expect. But sometimes, you do get the report you were expecting.
For example, if you create a custom report that combines the hit-level dimension Page with the user-level metric Users, you would expect to get a table that shows you how many users visited each page of your site. If you remember from earlier, hits do contain user information (the client id), so this actually is exactly the report that you would get in this case.
The lesson: don’t combine across scope when you’re reporting – and if you are going to combine across scope, think about the data! Look at your results and cross-reference them with other data in GA to be sure the table is showing you what you actually want to see. Keep in mind how GA is processing the data – a client ID is associated with every hit but a session ID is not. And as a rule, never combine hit- and session-level metrics.