Ultimate Guide to Querying Google Analytics 4 Properties or Firebase Data in BigQuery
Editor's Note: This article is about Universal Analytics properties, which will stop processing data on July 1, 2023 (July 2024 for Analytics 360 properties). For Tracking Single Page Applications with Google Analytics 4 Properties, please refer to this article instead.
If you’ve been on the fence about implementing Google Analytics 4 Properties (and/or Firebase Analytics), let us incentivize you to take the plunge: the BigQuery connection is free for all Google Analytics 4 Properties (formerly App + Web)!
While subscribers of Google Analytics 360 will be familiar with the power and flexibility that BigQuery offers, users of the free tier of Google Analytics have never before been offered this opportunity. With Google Analytics 4 Properties and Firebase, all product users are given access to hit-level unsampled data—both in the interface and most importantly, within Google BigQuery.
In the following discussion, we’ll provide resources for getting started with BigQuery and understanding the functionality of the tool. We’ll also go past implementation to discuss the important features of our Google Analytics 4 Properties event schema, the new data model, and any gotchas between historical Google Analytics web data and Google Analytics 4 Properties information (including Firebase data). Our goal is to leave you feeling confident in your next steps, and also give you a reference for any future questions that may come your way.
Table of Contents:
- What’s the Deal with BigQuery, Google Analytics 4 Properties, & Firebase
- I’m Sold—How Do I Get Connected
- The (Not-so-New) Events Table
- Querying the New Google Analytics 4 Properties Data Model
- Google Analytics 4 Properties & Firebase Query Strategies
- Gotchas When Working With BigQuery Google Analytics 4 Properties Data
- Final Comparisons Between Google Analytics 4 Properties & Universal Analytics (GA360) in BigQuery
- Reference Material & Additional Reading
What’s the Deal with BigQuery, Google Analytics 4 Properties, & Firebase?
If you’ve been working with Google Analytics or Google Marketing Platform data for a while, chances are that you’ve heard of Google BigQuery, Google Cloud Platform’s advanced database solution. While we praise the analytics interface for its accessibility and reporting ease, BigQuery provides our organizations with the opportunity to conduct a much deeper and more customized analysis for our business. Instead of focusing on predefined metrics and overall trend analysis, BigQuery enables us to uncover insights that are simply not possible without a little data transformation.
Additionally, BigQuery enables us to connect our Google Analytics and Firebase data with other third-party data sources, and even export our data to our visualization platforms like Data Studio, Tableau, PowerBI, among others. While there's now a native Data Studio Connector to Google Analytics 4 Properties, you may also want the flexibility of the Data Studio BigQuery connector for more complex reporting options
Since Data Studio offers several benefits to most organizations, including branding capabilities, enhanced sharing functionality, and calculated fields (among others), connecting this tool to customized reporting tables may be the only nudge that you need to get started with BigQuery. However, if you need a little bit more convincing that BigQuery is for you and your organization, check out my colleague’s blog post on the topic.
I’m Sold—How Do I Get Connected
Before we can dive into our data and begin writing SQL queries, we’ll need to connect our Google Analytics 4 Properties and/or Firebase projects to Google Cloud Platform.
For a step-by-step guide on how to get connected, visit my other post on the topic, Google Analytics 360 Starter Guide For Google BigQuery.
Note: If you have Google Analytics 360 but haven’t set up Google Analytics 4 Properties or Firebase yet, you can (and should!) also link your Universal Analytics property to BigQuery. .
The (Not-so-New) Events Table
Once your Firebase and/or Google Analytics 4 property is connected to BigQuery, you will start to see several datasets available to you for reporting. While we’re primarily interested in Google Analytics data for this discussion, which does not require the use of Firebase or mobile applications, it's useful to note that Firebase gives you access to several datasets that may be of interest to your company.
Google Analytics 4 Properties & Firebase Data in BigQuery
When you link your Firebase Project to BigQuery, there will likely be several supported and recommended product integrations for reporting.
- Crashlytics enables you to export fatal, non-fatal, and stack trace data for each of your apps
- The Cloud Messaging export enables you to evaluate whether the messages you send reach your users
- The Performance export creates a table that includes all the captured performance events
- The Predictions export creates a table for each app in the project that includes the Predictions data for each user
- You’ll also be able to access all analytics data related to your Firebase A/B tests in BigQuery
Today, we’re going to focus on the Google Analytics export, otherwise known as the Events table in BigQuery. However, if you are using Firebase, be sure to take advantage of these additional features and data sets when connecting BigQuery. If you do not have a mobile application, but are using a Google Analytics 4 property for your web streams, you will only need access to the Events table.
The Google Analytics Events Table
In BigQuery, your Google Analytics 4 Properties & Firebase tables will be located within your analytics dataset, and will begin with
events_(). If you’ve already linked your Firebase mobile app data to BigQuery and are simply extending this integration with web data streams, this next section will look very familiar to you because the table structure is the same! The primary difference in our data will be the addition of data streams and several new fields for web activity.
A single dataset named "analytics_(your property id)" will populate for each Firebase and/or Google Analytics 4 Properties project that is linked to BigQuery. If you have Firebase and Data Streams in Google Analytics 4 Properties, each app and data stream for which BigQuery exporting is enabled will export its data to that single dataset.
For each day of export, a table is imported within each analytics dataset. These tables will be formatted and separated by date, and appear as "events_YYYYMMDD." Because the events table is processed daily (and stored in an individual table per day), the numbers next to the event table (e.g. 145 below) represent the number of daily partitioned tables available to you for analysis. The events tables are on a one-day lag, meaning that your Tuesday, August 2nd data will arrive on Wednesday, August 3rd, for example.
Additionally, a table may be imported for Google Analytics 4 Properties events received throughout the current day. This table is named "events_intraday_YYYYMMDD" and it is populated in near real-time as app events are collected.
The Google Analytics Sessions Table
The new Google Analytics 4 Properties format differs slightly from our traditional Google Analytics 360 daily and intraday exports. In our Google Analytics 360 exports, each dataset corresponds to a view within Google Analytics rather than a Firebase project. Although we receive a daily
ga_sessions table once we are integrated with Google Analytics, the table schemas themselves are also very unique.
This distinction is important, and isn’t just naming conventions! Our events tables are structured very differently from our sessions tables, and therefore will be queried in different ways. We’ll dive deeper into our queries in a moment.
The Nested Table Structure
Similar to our Google Analytics 360 sessions table export, our Google Analytics 4 Properties events tables feature a nested and repeated fields structure.
In our Google Analytics 4 Properties and Firebase tables, each row represents a single event. For example, in the image below we see a
screen_view event, which houses several associated (and nested) event parameters and several columns of dimensions and metrics. Our events will contain user and event information, event parameters, and user properties.
To reference a complete list of columns and data, you’ll want to review the Google Analytics 4 Properties and Firebase export schema.
This simplified Google Analytics 4 Properties data model is a stark contrast to our Google Analytics 360 export where each row represents a single session. For example, in the image below we see a single session denoted by a visitID and a visit start time, followed by several associated (and nested) columns composed of dimensions and metrics. In our traditional Google Analytics exports, one session will compose all events, custom dimensions, and interactions. You may want to refer back to the Google Analytics export schema to isolate each column available for reporting.
Because Google Analytics data is hierarchical, you will often find yourself needing to write a query that accesses values from nested or repeated fields. In Universal Analytics (Google Analytics 360), this may be accessing custom dimension values, for example. In Google Analytics 4 Properties and Firebase, this may be accessing event parameters.
In general, we’ve found that the hierarchical structure of our new Firebase and Google Analytics 4 Properties tables is far less complicated than that of our traditional Universal Analytics properties. If the idea of unnesting or flattening Google Analytics tables in BigQuery seemed daunting in the past, don’t fret! Building a query in Google Analytics 4 Properties is generally much easier than in Universal Analytics.
The UNNEST function
To start to query your Google Analytics 4 Properties data, you’ll often need to flatten, or create repeated rows, for each event. This is because common data points such as
user_params are nested within each event.
In the following example, we see that each event parameter (level or score, for example) are contained within the same event row.
SELECT * FROM `firebase-public-project.analytics_153293282.events_*` -- update to your table WHERE _table_suffix = '20181003' -- update to your desired start and end date AND event_name = 'post_score'
When you use the
UNNEST function, you’re essentially taking each parameter, and attaching it to a new column where any aggregated column is repeated. In this case, when we
event_param, we create a new column and attach the event name (or original row) to each new row that has been populated.
SELECT event_name, params.key AS event_param_key, params.value AS event_param_value, FROM `firebase-public-project.analytics_153293282.events_*`, -- update to your table UNNEST (event_params) AS params WHERE _table_suffix = '20181003' -- update to your desired start and end date AND event_name = 'post_score'
If you’d like to go deeper into the process behind
UNNESTING, I urge you to review Todd Kerpelman’s article on Firebase and nested events.
Querying the New Google Analytics 4 Properties Data Model
As we discussed above, each row within our events table represents a single event. Therefore, it’s no surprise that the table schema of our new Google Analytics 4 Properties and Firebase data closely resembles the updated data model of Google Analytics 4 Properties, where events are placed at the forefront of our analysis.
Rather than focusing on sessions, this event-driven data model creates new metrics that enable unified cross-platform behavior. For all data streams and app streams, analysts working in our BigQuery events table will primarily focus on isolating specific activities across users (and user properties) and events (and parameters).
Sample Query 1: Events in Google Analytics 4 Properties
When analyzing your website or mobile application’s performance, you may want to count or sum unique activities as they occur. For our first sample query, we’ll aggregate event activity in conjunction with several other dimensions, like date and
To sum our users’ events, we will use a combination of the event name, the userID, and the timestamp of the event. Concatenating the user id and the timestamp ensures that we have all interactions per user, including multiples if they occurred during the given timeframe. This combination of userID and timestamp can also be considered a unique event ID.
CASE WHEN statement, we supply the argument. In this example, our logic can be described as: when the event name is
first_open, count the concatenated users/timestamp), and return our value.
COUNT(DISTINCT CASE WHEN event_name = 'first_open' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING)) END ) AS first_open,
If you are using a single argument or condition, a simple
IF statement will also do the trick:
SELECT SUM( IF (event_name = 'first_open', 1, 0)) AS first_open FROM `firebase-public-project.analytics_153293282.events_*` WHERE _table_suffix = '20181003' -- update to your desired start and end date'
You can use the following query to isolate events by event name, and sum the number of event instances that occur.
SELECT PARSE_DATE('%Y%m%d', event_date) AS date, stream_id, -- note, stream_id is available for web streams as well app_info.id AS app_id, app_info.version AS app_version, platform AS app_platform, device.mobile_brand_name AS device_brand, device.mobile_model_name AS device_model, -- to count our events, we will use our event_name, and count USERS concatenated with a timestamp -- this gives us each event per user, or all events. COUNT(DISTINCT CASE WHEN event_name = 'first_open' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING)) END ) AS first_open, COUNT(DISTINCT CASE WHEN event_name = 'app_update' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING)) END ) AS app_update, COUNT(DISTINCT CASE WHEN event_name = 'app_remove' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING)) END ) AS app_remove, COUNT(DISTINCT CASE WHEN event_name = 'post_score' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING)) END ) AS post_score FROM `firebase-public-project.analytics_153293282.events_*` -- modify to your project WHERE _table_suffix = '20181003' GROUP BY 1, 2, 3, 4, 5, 6, 7
Let’s take a look at the output. With the above query, we can see a flattened table for our date range. This table contains several specified dimensions for analysis, such as
app_versionapp_plaform, along with the sum of specified events.
How might I use this data? Perhaps I am interested in seeing if our users have an outdated version of our app, thus not posting as many scores. We also may want to see which platforms (Android or iOS) are resulting in the most
This query can be modified to include any dimension that you are interested in! Be sure to reference the Google Analytics 4 Properties and Firebase export schema to see a full list of possible dimensions. Remember, even if you don’t have a mobile application within your digital ecosystem, the above query structure applies to you!
Event Custom Dimensions (Formally Known as Parameters)
In Google Analytics 4 Properties, our events may also be associated several custom dimensions, or details (also known as parameters). This differs from Google Analytics today, where we often associate an Event Category with an Action, Label, and Value. Each event is structured in terms of key-value pairs, where we have an
event_name and parameter values associated with a key, or parameter name.
Another way to think about this model is represented below:
Each parameter value in Google Analytics 4 Properties is also assigned a data type such as a string, integer, double, or a float. To properly query for our Google Analytics 4 Properties events in BigQuery, we’ll need to first understand what parameters are associated with each
event_name, and which data type of parameter we are working with.
When in BigQuery, we can easily see these event parameters by clicking on the "preview" table pane. In the example below for the
user_engagement event, we see that we have several event parameters assigned, including the
engagement_time_msec. You’ll notice that the
screen_origin parameters are associated with string values, while the
engagement_time_msec metrics are associated with integers.
These parameter key-value pairs will be unique for each event. In the example below for
post_score, we have a few new parameter key-value pairs, such as level, time, and score, all of which are doubles.
It won’t be very efficient to manually screen all of your events to better understand these key-value pairs. Instead, we can run a quick query to gain a comprehensive understanding of your events and parameter keys in Google Analytics 4 Properties. This query will return the event parameter values that we want to select by event.
SELECT event_name, params.key AS event_param_key, CASE -- check to see if the key value pairs are populated for each event_name WHEN params.value.string_value IS NOT NULL THEN 'value.string_value' WHEN params.value.int_value IS NOT NULL THEN 'value.int_value ' WHEN params.value.double_value IS NOT NULL THEN 'value.double_value' WHEN params.value.float_value IS NOT NULL THEN 'value.float_value' END AS event_param_value FROM `firebase-public-project.analytics_153293282.events_*`, -- update to your table UNNEST(event_params) AS params -- flatten our event parameters and event names WHERE _table_suffix = '20181003' -- update to your desired start and end date GROUP BY event_name, event_param_key, event_param_value ORDER BY event_name, event_param_key
For the two events we discussed earlier (
post_score), the query above produces the following streamlined and readable output:
After modifying the query above for your own events table, be sure to keep this output handy for your future queries! You’ll want to refer back to it regularly for accessing information.
Note: Alternatively, you could also take advantage of the
COALESCE() function in BigQuery to retrieve each event parameter value (string, integer, float, etc.) that is not null. One way to do this is shown below, but you may also evaluate each parameter separately. Check out the BigQuery
COALESCE() function documentation for more information.
SELECT event_name, params.key AS event_param_key, COALESCE(CAST(value.string_value AS STRING), CAST(value.int_value AS STRING), CAST(value.double_value AS STRING), CAST(value.float_value AS STRING)) AS event_param_data FROM `firebase-public-project.analytics_153293282.events_*`, -- update to your table UNNEST(event_params) AS params -- flatten our event parameters and event names WHERE _table_suffix = '20181003' -- update to your desired start and end date GROUP BY event_name, event_param_key, event_param_data ORDER BY event_name, event_param_key
Sample Query 2: Events & Parameters in Google Analytics 4 Properties
Although understanding a count of events is incredibly useful, you will probably want to access and isolate your event parameters as well for analysis. You can use the following query to
UNNEST events by event name and parameter with key-value pairs by user. This user-level granularity is not available in the Analytics interface, and is one of the key benefits of Google Analytics 4 Properties!
Using the event
post_score from above, let’s understand attribution by level, time, and score.
We see here that we’ll need both
double_value parameters. In our queries, we
UNNEST each event parameter individually so that we can specify the associated value.
level_name, we’ll use the
string_value parameter. In addition to this column, we specify the event name and the
event_parameter key (
( SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'post_score' AND key = 'level_name') AS level_name,
To continue building our query, we’ll repeat this process for each key-value pair that we are interested in using subqueries.
SELECT PARSE_DATE('%Y%m%d', event_date) AS date, user_pseudo_id AS user_id, event_name, traffic_source.source AS source, traffic_source.medium AS medium, CONCAT(traffic_source.source,' / ',traffic_source.medium) AS source_medium, traffic_source.name AS campaign, ( SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'post_score' AND key = 'level_name') AS level_name, ( SELECT value.double_value FROM UNNEST(event_params) WHERE event_name = 'post_score' AND key = 'level') AS level, ( SELECT value.double_value FROM UNNEST(event_params) WHERE event_name = 'post_score' AND key = 'time') AS time, ( SELECT value.double_value FROM UNNEST(event_params) WHERE event_name = 'post_score' AND key = 'score') AS score FROM `firebase-public-project.analytics_153293282.events_*` -- modify to your project WHERE _table_suffix = '20181003' AND event_name = 'post_score' -- filtering on the single event of post_score to avoid NULL values.
Our results will give us a unique row for each
level_name, level, time, and score combination by
user_id and acquisition metrics.
Sample Query 3: Pageview and Screenview Events
Because almost every interaction in Google Analytics 4 Properties and Firebase is an event, we can also capture and report on our pageview and screenview data by
UNNESTING our event parameters.
For website data, we’ll look for
page_view events as seen below.
The following query also makes use of temporary tables using the
WITH pages AS ( SELECT PARSE_DATE('%Y%m%d', event_date) AS date, user_pseudo_id AS user_id, event_name, ( SELECT value.int_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'ga_session_id') AS session_id, ( SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') AS page_path, ( SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_title') AS page_title FROM `.analytics_.events_*` -- modify to your project WHERE _table_suffix = '20200811' AND event_name = 'page_view') SELECT date, page_path, page_title, COUNT(page_path) AS pageviews, COUNT(DISTINCT user_id) AS users FROM pages GROUP BY 1, 2, 3 ORDER BY date DESC, pageviews DESC
If we are interested in screenview data (Firebase apps), we’ll modify the above query to include
screen_view event parameters, which differ slightly from our
WITH screens AS ( SELECT PARSE_DATE('%Y%m%d', event_date) AS date, user_pseudo_id AS user_id, event_name, ( SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'screen_view' AND key = 'firebase_screen') AS firebase_screen, ( SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'screen_view' AND key = 'firebase_screen_class') AS firebase_screen_class FROM `firebase-public-project.analytics_153293282.events_*` -- update to your table WHERE _table_suffix = '20181003' AND event_name = 'screen_view') SELECT date, firebase_screen, firebase_screen_class, COUNT(firebase_screen) AS screenviews, COUNT(DISTINCT user_id) AS users FROM screens GROUP BY 1, 2, 3 ORDER BY date DESC, screenviews DESC
Like our page view information, this query will return a list of screens and associated users and screenviews.
One important feature of the data model that we haven’t touched on is user properties, which are similar to our user-scoped custom dimensions in Google Analytics today. Like our events, user properties have a name and an associated value. Today, you can set up to 25 user properties at the Google Analytics 4 property level. Once you set a user property, it persists for that user until its value changes or is set to null.
Similar to our event params, we can run a quick query to gain a comprehensive understanding of your user properties and associated keys in Google Analytics 4 Properties. This query will return the user properties values that we want to select by user property name. Note here that the only difference from our previous query is that we are
UNNESTING by (
user_properties) instead of (
SELECT user_properties.key AS user_properties_key, CASE -- check to see if the key value pairs are populated for each event_name WHEN user_properties.value.string_value IS NOT NULL THEN 'value.string_value' WHEN user_properties.value.int_value IS NOT NULL THEN 'value.int_value ' WHEN user_properties.value.double_value IS NOT NULL THEN 'value.double_value' WHEN user_properties.value.float_value IS NOT NULL THEN 'value.float_value' END AS user_properties_value FROM `firebase-public-project.analytics_153293282.events_*`, -- update to your table UNNEST(user_properties) AS user_properties -- flatten our user properties WHERE _table_suffix = '20181003' -- update to your desired start and end date GROUP BY user_properties_key, user_properties_value ORDER BY user_properties_key
Sample Query 4: User Properties
To analyze user properties, we will
UNNEST by (
user_properties) to access our,
user_property_key and parameter value. Again, this user-level granularity may not always be available in the Analytics interface, and can be considered a key benefit of linking BigQuery!
In this example, we may be interested in seeing which users play both quickplay and progressive, or maybe just one or the other. We’ll
UNNEST each user property by key, and count the associated users.
WITH user_properties AS ( SELECT PARSE_DATE('%Y%m%d', event_date) AS date, user_pseudo_id AS user_id, ( SELECT value.string_value FROM UNNEST(user_properties) WHERE key = 'plays_quickplay') AS plays_quickplay, ( SELECT value.string_value FROM UNNEST(user_properties) WHERE key = 'plays_progressive') AS plays_progressive FROM `firebase-public-project.analytics_153293282.events_*` -- update to your table WHERE _table_suffix = '20181003') -- update to your desired start and end date SELECT date, plays_quickplay, plays_progressive, COUNT(DISTINCT user_id) AS users FROM user_properties GROUP BY 1, 2, 3 ORDER BY date DESC, users DESC
The output of this query shows us that, for our given date range, users are most likely to play neither progressive nor quickplay, nor a combination of both games.
Google Analytics 4 Properties & Firebase Query Strategies
After walking through the example queries above, you’ve probably noticed a pattern. In Google Analytics 4 Properties and Firebase, we’ll need to isolate our event data and parameters (or user property information) and create several subqueries to handle our key-value pairs.
Overall, however, the amount of query complexity has been significantly reduced from our traditional Google Analytics BigQuery sessions table. This is primarily due to the simplified data model—events and user properties–that’s it!
Available Metrics & Dimensions in BigQuery
Another important detail about all Google Analytics data in BigQuery (Google Analytics 4 Properties and Google Analytics 360 data) is the fact that you will need to manually calculate many of the metrics that you see in the interface—they are not all provided for you!
In Google Analytics 4 Properties, this may mean calculating new users, count of sessions, or engaged sessions, among others. We generally have far fewer metrics that we need to calculate in our Google Analytics 4 Properties to match our reporting interface than in our traditional Google Analytics 360 sessions tables, but we will still often need to reference the export schema to see what is available to us.
For example, new users is a "missing" metric in our BigQuery export schema that we can easily recreate in Google Analytics 4 Properties using the strategies that we defined above. To do so, we’ll isolate our
session_start event, which is one of our automatically collected events in Google Analytics 4 Properties. With this event, we can isolate the
ga_session_number, and determine if our user is new or returning.
COUNT(DISTINCT CASE WHEN ( SELECT Value.int_value FROM UNNEST(event_params) WHERE event_name = 'session_start' AND key = 'ga_session_number') = 1 THEN user_pseudo_id ELSE NULL END ) AS new_users
The Sessions Metric and Other Universal Analytics Dilemmas
With the migration to Google Analytics 4 Properties, you may experience organizational anxiety over the changing data model. Many of you will be asked by your business users to recreate some of the dimensions or metrics that we see in our Universal Analytics (Google Analytics 360 data) today that are not part of the Google Analytics 4 Properties data model. Most notably, you may be asked to recreate session-scoped metrics such as sessions and bounce rates, among others. You may also be asked to recreate landing pages or exit pages within these sessions.
Is this possible with our Google Analytics 4 Properties data? The short answer is yes, and in fact, multiple industry experts have already drafted several queries to address these problems. However, before you start to go down the bounce rate rabbit hole, ask yourself if session-scoped metrics are the best metrics and dimensions to represent your changing business.
After some reflection, you may find that the new events data model fits your business profile more accurately. An event and user-based data model gives companies the ability to quickly determine cross-channel marketing impact, uncover customer journeys across platforms, evaluate overall user experience all in one unified package. It also eliminates one more confusion area of digital analytics—scoping concerns!
Ultimately, we’d recommend using the transition to Google Analytics 4 Properties as a time to re-evaluate your data collection and reporting strategy as a whole. If you can shift your thinking to an event-based world, you may be better prepared to take advantage of any new reporting that comes along in the Google Analytics 4 Properties interface.
Gotchas When Working With BigQuery Google Analytics 4 Properties Data
The following list represents some high-level challenges or "gotchas" that we’ve faced when reporting on our Google Analytics 4 Properties data in BigQuery.
Data Differences Between the Interface & BigQuery
If you’ve worked in BigQuery before, you are familiar with the differences between the raw hit-level export data vs. the aggregated reports in Google Analytics. This aggregation is one of the reasons that we generally need to recreate several common metrics in BigQuery. The data model and export schema aren’t actually "missing" data, instead, the Google Analytics reports are pre-aggregating data for us within the interface.
Because Google Analytics reports consist of pre-aggregated data, there are often small differences (~1-2 percent) between our calculations in BigQuery and the Google Analytics interface reports. Most of this still holds true for our Firebase and Google Analytics 4 Properties export, especially relating to users.
Users in BigQuery vs. Google Analytics
One of the most common differences that you will see in BigQuery vs. Google Analytics is the user count. The pre-aggregated reports in Google Analytics rely on a probabilistic cardinality estimation algorithm called HyperLogLog ++. This algorithm approximates our user count in the analytics interface, and generally is within an acceptable range (~1-2 percent).
However, when we calculate our users in BigQuery, we rely on the underlying hit-level data. While our numbers in BigQuery may not exactly match what we see in Google Analytics, our BigQuery calculations are more accurate! Because the aggregated reports are run daily, as you increase your date range you may find that the percentage variance increases exponentially over time.
In summary, Google’s pre-aggregated tables enable Google Analytics to serve us data more quickly, but often take a few steps to recreate in BigQuery. We may also find that with aggregation over time comes less accuracy, so don’t fret if you see differences in BigQuery and Google Analytics over time.
App Data: Your SDK Version Matters
When analyzing Firebase mobile app data, your software development kit version matters! There are two layers of SDK consideration: 1) the version of Firebase Analytics installed within your app combined with 2) the version of the app that your users access on their devices. Both of these SDK considerations will often make a difference in the data that you see in BigQuery.
This is because the Firebase Analytics SDK (and Core SDK) experience numerous releases and enhancements throughout the year. These releases and enhancements generally result in newly available features and data, such as the more recent additions of the session ID or eCommerce objects. If your app is using an outdated version, or if your users are using an outdated app installation (even if your most recent push uses up-to-date analytics), you may see data discrepancies between users.
If a recent release, such as enhanced eCommerce, is of utmost importance to your organization, you can always push a forced upgrade to your app users. However, consider the implications of your user experience before doing so.
eCommerce: There’s a New Array for That!
Speaking of eCommerce, Google has released a new items array for Google Analytics 4 Properties website (Google Tag Manager or gtag) implementations and Firebase app projects. This array-based tracking method is very similar to our existing eCommerce implementation in Universal Analytics, but with a few caveats.
To get the most out of your enhanced eCommerce will require a few queries as well as a visualization tool of your choice (perhaps Data Studio or Tableau, for example).
In BigQuery, we’ll access our eCommerce array values with each associated eCommerce event. For web hits, the eCommerce gtag documentation does a nice job of summarizing the types of events that you may want to collect, including
add_to_cart, and purchase
Note: eCommerce measurement requires you to link your Firebase project to an Analytics account and to have the Android SDK v17.3.0 or iOS v6.20.0 and up in your app.
Sample Query 5: eCommerce in BigQuery
In this sample query, let’s take a look at
UNNESTING eCommerce values within a purchase event. To do so, we’ll use the key-value pairs that we’d like to report on, such as
transaction_id, and add in our associated transaction metrics from the BigQuery export schema.
For eCommerce, there are two data types that we will primarily use:
ecommerce and items. The
ecommerce.() dimensions store transaction information, or are records surrounding items like revenue, quantity, and total items. These values are already "flattened" and are directly associated with the purchase event in BigQuery.
Using this information, we can isolate our transactions by date and
transaction_id, which will give us a high-level understanding of performance.
SELECT event_date AS date, ( SELECT value.string_value FROM UNNEST(event_params) WHERE key = "transaction_id") AS transaction_id, ecommerce.total_item_quantity AS quantity, ROUND(ecommerce.purchase_revenue, 2) AS revenue, ROUND(ecommerce.shipping_value, 2) AS shipping, ROUND(ecommerce.tax_value, 2) AS tax, ecommerce.unique_items AS unique_items FROM `.analytics_.events_*` -- modify to your project WHERE event_name = "purchase" ORDER BY revenue DESC
Sample Query 6: The Items Array in BigQuery
We also have product-level information in our
items.() dimensions. These items are also associated with an event in BigQuery.
Let’s take a look at the
view_item_list event, which represents an impression of a product/item list on your website. Here, we use our event in our
WHERE clause, but are also
UNNESTING by the items array:
SELECT event_date AS date, event_name, items.item_id, items.item_name, items.item_brand, items.item_variant, items.item_category, items.item_category2, items.item_category3, items.item_category4, items.item_category5, items.price, items.quantity, items.affiliation, items.location_id FROM `.analytics_.events_*`, -- modify to your project UNNEST(items) AS items WHERE event_name = "view_item_list"
This query returns a very streamlined table, which shows us a unique row for each event. We could also
SUM() these values to count the number of items lists that were viewed for future analysis.
Final eCommerce Observations
Because there are limited (as of Q3 2020) aggregated eCommerce reports within the Google Analytics 4 Properties interface, you will want to be extra cautious with your query building. Essentially, there may not be a pre-fabricated report to validate your query findings against, so be sure to run your new eCommerce implementation in parallel with your Google Analytics 360 properties for peace of mind.
Final Comparisons Between Google Analytics 4 Properties & Universal Analytics (GA360) in BigQuery
We’ve covered a lot of material relating to the new Google Analytics 4 Properties and BigQuery strategies. However, if you are already an avid user of the tool for Google Analytics 360 data, but need a guideline on the differences between the two tables, you may find the following comparison sheet useful as you make the switch to the new data model.
|Topic||Google Analytics 4 Properties Events Table||Google Anlaytics 360 Sessions Table|
|Cost||Free Connection (integration), with associated query & storage costs.||Connection included with Google Analytics 360 subscription (integration), with associated query & storage costs.|
|Integration Process||Occurs in the Firebase Console or in the Google Analytics 4 Property Interface.||Occurs directly in the Google Analytics Interface under Property Settings.|
|Table Structure||In our GA4 & Firebase tables, each row represents a single event.||In the Universal Analytics tables, each row represents a single session.|
|Data Studio Connectivity||Google Analytics or BigQuery Connector.||Google Analytics or BigQuery Connector|
|Custom Event Reporting||Each event_name will be unnested with associated event parameters.||Each Event Category, Action, and Label are separate hit-scoped fields.|
|Custom User Reporting||Each user_property will be unnested with associated values.||Each User-Scoped custom dimension must be unnested.|
|eCommerce Reporting||The items.() array and eCommerce.() data can be accessed for each event, and unnested.||eCommerce data must be accessed through totals.(), hits.(), or product.() nested tables.|
Reference Material & Additional Reading
While we hope this post has been comprehensive, we know that there is so much to learn when considering Google Analytics 4 Properties and BigQuery. The following represents a few great resources to continue your education:
- Google Analytics 4 Product Announcement
- Google Analytics 4 Properties overview
- Google Analytics 4 Properties Reporting overview
- BigQuery Export Schema
- Firebase Public Project Sample Data Set
- Firebase Sample Data Sets:
- How to Connect BigQuery to a Google Analytics 4 Property
- Starter Guide for Google Analytics 4 Properties & BigQuery
- More Sample Queries
- Controlling Costs in BigQuery
- Data Studio, Create a BigQuery Data Source
- Data Studio, Connect to BigQuery
- Data Studio, Visualizing BigQuery Data