Ultimate Guide to Querying Google Analytics 4 Properties or Firebase Data in BigQuery

September 18, 2020 | Abby Matchett
Ultimate Guide to Querying App + Web or Firebase Data in BigQuery blog image

Editor's Note: This article is about Universal Analytics properties, which will stop processing data on July 1, 2023 (July 2024 for Analytics 360 properties). For Tracking Single Page Applications with Google Analytics 4 Properties, please refer to this article instead.

If you’ve been on the fence about implementing Google Analytics 4 Properties (and/or Firebase Analytics), let us incentivize you to take the plunge: the BigQuery connection is free for all Google Analytics 4 Properties (formerly App + Web)!

While subscribers of Google Analytics 360 will be familiar with the power and flexibility that BigQuery offers, users of the free tier of Google Analytics have never before been offered this opportunity. With Google Analytics 4 Properties and Firebase, all product users are given access to hit-level unsampled data—both in the interface and most importantly, within Google BigQuery.

In the following discussion, we’ll provide resources for getting started with BigQuery and understanding the functionality of the tool. We’ll also go past implementation to discuss the important features of our Google Analytics 4 Properties event schema, the new data model, and any gotchas between historical Google Analytics web data and Google Analytics 4 Properties information (including Firebase data). Our goal is to leave you feeling confident in your next steps, and also give you a reference for any future questions that may come your way.

Table of Contents:

What’s the Deal with BigQuery, Google Analytics 4 Properties, & Firebase?

If you’ve been working with Google Analytics or Google Marketing Platform data for a while, chances are that you’ve heard of Google BigQuery, Google Cloud Platform’s advanced database solution. While we praise the analytics interface for its accessibility and reporting ease, BigQuery provides our organizations with the opportunity to conduct a much deeper and more customized analysis for our business. Instead of focusing on predefined metrics and overall trend analysis, BigQuery enables us to uncover insights that are simply not possible without a little data transformation.

Additionally, BigQuery enables us to connect our Google Analytics and Firebase data with other third-party data sources, and even export our data to our visualization platforms like Data Studio, Tableau, PowerBI, among others. While there's now a native Data Studio Connector to Google Analytics 4 Properties, you may also want the flexibility of the Data Studio BigQuery connector for more complex reporting options

Since Data Studio offers several benefits to most organizations, including branding capabilities, enhanced sharing functionality, and calculated fields (among others), connecting this tool to customized reporting tables may be the only nudge that you need to get started with BigQuery. However, if you need a little bit more convincing that BigQuery is for you and your organization, check out my colleague’s blog post on the topic.

The Value Of Google BigQuery And Google Analytics 360
By Alex Moore,
SVP of Analytics & Insight
May 17, 2017

I’m Sold—How Do I Get Connected

Before we can dive into our data and begin writing SQL queries, we’ll need to connect our Google Analytics 4 Properties and/or Firebase projects to Google Cloud Platform.

For a step-by-step guide on how to get connected, visit my other post on the topic, Google Analytics 360 Starter Guide For Google BigQuery.

Note: If you have Google Analytics 360 but haven’t set up Google Analytics 4 Properties or Firebase yet, you can (and should!) also link your Universal Analytics property to BigQuery. .

The (Not-so-New) Events Table

Once your Firebase and/or Google Analytics 4 property is connected to BigQuery, you will start to see several datasets available to you for reporting. While we’re primarily interested in Google Analytics data for this discussion, which does not require the use of Firebase or mobile applications, it's useful to note that Firebase gives you access to several datasets that may be of interest to your company.

Google Analytics 4 Properties & Firebase Data in BigQuery

When you link your Firebase Project to BigQuery, there will likely be several supported and recommended product integrations for reporting.

image showing setup of how to link firebase to bigquery

  • Crashlytics enables you to export fatal, non-fatal, and stack trace data for each of your apps
  • The Cloud Messaging export enables you to evaluate whether the messages you send reach your users
  • The Performance export creates a table that includes all the captured performance events
  • The Predictions export creates a table for each app in the project that includes the Predictions data for each user
  • You’ll also be able to access all analytics data related to your Firebase A/B tests in BigQuery

Today, we’re going to focus on the Google Analytics export, otherwise known as the Events table in BigQuery. However, if you are using Firebase, be sure to take advantage of these additional features and data sets when connecting BigQuery. If you do not have a mobile application, but are using a Google Analytics 4 property for your web streams, you will only need access to the Events table.

The Google Analytics Events Table

In BigQuery, your Google Analytics 4 Properties & Firebase tables will be located within your analytics dataset, and will begin with events_(). If you’ve already linked your Firebase mobile app data to BigQuery and are simply extending this integration with web data streams, this next section will look very familiar to you because the table structure is the same! The primary difference in our data will be the addition of data streams and several new fields for web activity.

Datasets

A single dataset named "analytics_(your property id)" will populate for each Firebase and/or Google Analytics 4 Properties project that is linked to BigQuery. If you have Firebase and Data Streams in Google Analytics 4 Properties, each app and data stream for which BigQuery exporting is enabled will export its data to that single dataset.

Tables

For each day of export, a table is imported within each analytics dataset. These tables will be formatted and separated by date, and appear as "events_YYYYMMDD." Because the events table is processed daily (and stored in an individual table per day), the numbers next to the event table (e.g. 145 below) represent the number of daily partitioned tables available to you for analysis. The events tables are on a one-day lag, meaning that your Tuesday, August 2nd data will arrive on Wednesday, August 3rd, for example.

Additionally, a table may be imported for Google Analytics 4 Properties events received throughout the current day. This table is named "events_intraday_YYYYMMDD" and it is populated in near real-time as app events are collected.

image of events table

The Google Analytics Sessions Table

The new Google Analytics 4 Properties format differs slightly from our traditional Google Analytics 360 daily and intraday exports. In our Google Analytics 360 exports, each dataset corresponds to a view within Google Analytics rather than a Firebase project. Although we receive a daily ga_sessions table once we are integrated with Google Analytics, the table schemas themselves are also very unique.

image of sessions table

This distinction is important, and isn’t just naming conventions! Our events tables are structured very differently from our sessions tables, and therefore will be queried in different ways. We’ll dive deeper into our queries in a moment.

The Nested Table Structure

Similar to our Google Analytics 360 sessions table export, our Google Analytics 4 Properties events tables feature a nested and repeated fields structure.

In our Google Analytics 4 Properties and Firebase tables, each row represents a single event. For example, in the image below we see a screen_view event, which houses several associated (and nested) event parameters and several columns of dimensions and metrics. Our events will contain user and event information, event parameters, and user properties.

To reference a complete list of columns and data, you’ll want to review the Google Analytics 4 Properties and Firebase export schema.

This simplified Google Analytics 4 Properties data model is a stark contrast to our Google Analytics 360 export where each row represents a single session. For example, in the image below we see a single session denoted by a visitID and a visit start time, followed by several associated (and nested) columns composed of dimensions and metrics. In our traditional Google Analytics exports, one session will compose all events, custom dimensions, and interactions. You may want to refer back to the Google Analytics export schema to isolate each column available for reporting. 

Because Google Analytics data is hierarchical, you will often find yourself needing to write a query that accesses values from nested or repeated fields. In Universal Analytics (Google Analytics 360), this may be accessing custom dimension values, for example. In Google Analytics 4 Properties and Firebase, this may be accessing event parameters.

In general, we’ve found that the hierarchical structure of our new Firebase and Google Analytics 4 Properties tables is far less complicated than that of our traditional Universal Analytics properties. If the idea of unnesting or flattening Google Analytics tables in BigQuery seemed daunting in the past, don’t fret! Building a query in Google Analytics 4 Properties is generally much easier than in Universal Analytics.

The UNNEST function

To start to query your Google Analytics 4 Properties data, you’ll often need to flatten, or create repeated rows, for each event. This is because common data points such as event_params or user_params are nested within each event.

In the following example, we see that each event parameter (level or score, for example) are contained within the same event row.

SELECT
 *
FROM
  `firebase-public-project.analytics_153293282.events_*` -- update to your table
WHERE
  _table_suffix = '20181003' -- update to your desired start and end date
  AND event_name = 'post_score'

When you use the UNNEST function, you’re essentially taking each parameter, and attaching it to a new column where any aggregated column is repeated. In this case, when we UNNEST an event_param, we create a new column and attach the event name (or original row) to each new row that has been populated.

SELECT
  event_name,
  params.key AS event_param_key,
  params.value AS event_param_value,
FROM
  `firebase-public-project.analytics_153293282.events_*`, -- update to your table
   UNNEST (event_params) AS params
WHERE
  _table_suffix = '20181003' -- update to your desired start and end date
  AND event_name = 'post_score'

If you’d like to go deeper into the process behind UNNESTING, I urge you to review Todd Kerpelman’s article on Firebase and nested events.

Querying the New Google Analytics 4 Properties Data Model

As we discussed above, each row within our events table represents a single event. Therefore, it’s no surprise that the table schema of our new Google Analytics 4 Properties and Firebase data closely resembles the updated data model of Google Analytics 4 Properties, where events are placed at the forefront of our analysis.

Rather than focusing on sessions, this event-driven data model creates new metrics that enable unified cross-platform behavior. For all data streams and app streams, analysts working in our BigQuery events table will primarily focus on isolating specific activities across users (and user properties) and events (and parameters).

Sample Query 1: Events in Google Analytics 4 Properties

When analyzing your website or mobile application’s performance, you may want to count or sum unique activities as they occur. For our first sample query, we’ll aggregate event activity in conjunction with several other dimensions, like date and stream_id.

To sum our users’ events, we will use a combination of the event name, the userID, and the timestamp of the event. Concatenating the user id and the timestamp ensures that we have all interactions per user, including multiples if they occurred during the given timeframe. This combination of userID and timestamp can also be considered a unique event ID.

With a CASE WHEN statement, we supply the argument. In this example, our logic can be described as: when the event name is first_open, count the concatenated users/timestamp), and return our value.

COUNT(DISTINCT
      CASE
        WHEN event_name = 'first_open' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING))
    END
      ) AS first_open,

If you are using a single argument or condition, a simple IF statement will also do the trick:

SELECT
  SUM(
  IF
    (event_name = 'first_open',
      1,
      0)) AS first_open
FROM
  `firebase-public-project.analytics_153293282.events_*`
WHERE
  _table_suffix = '20181003' -- update to your desired start and end date'

You can use the following query to isolate events by event name, and sum the number of event instances that occur.

SELECT
  PARSE_DATE('%Y%m%d',
    event_date) AS date,
  stream_id,
  -- note, stream_id is available for web streams as well
  app_info.id AS app_id,
  app_info.version AS app_version,
  platform AS app_platform,
  device.mobile_brand_name AS device_brand,
  device.mobile_model_name AS device_model,
  -- to count our events, we will use our event_name, and count USERS concatenated with a timestamp
  -- this gives us each event per user, or all events.
  COUNT(DISTINCT
    CASE
      WHEN event_name = 'first_open' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING))
  END
    ) AS first_open,
  COUNT(DISTINCT
    CASE
      WHEN event_name = 'app_update' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING))
  END
    ) AS app_update,
  COUNT(DISTINCT
    CASE
      WHEN event_name = 'app_remove' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING))
  END
    ) AS app_remove,
  COUNT(DISTINCT
    CASE
      WHEN event_name = 'post_score' THEN CONCAT(user_pseudo_id, CAST(event_timestamp AS STRING))
  END
    ) AS post_score
FROM
  `firebase-public-project.analytics_153293282.events_*` -- modify to your project
WHERE
  _table_suffix = '20181003'
GROUP BY
  1,
  2,
  3,
  4,
  5,
  6,
  7

Let’s take a look at the output. With the above query, we can see a flattened table for our date range. This table contains several specified dimensions for analysis, such as app_id, app_versionapp_plaform, along with the sum of specified events.

How might I use this data? Perhaps I am interested in seeing if our users have an outdated version of our app, thus not posting as many scores. We also may want to see which platforms (Android or iOS) are resulting in the most app_updates.

This query can be modified to include any dimension that you are interested in! Be sure to reference the Google Analytics 4 Properties and Firebase export schema to see a full list of possible dimensions. Remember, even if you don’t have a mobile application within your digital ecosystem, the above query structure applies to you!

Event Custom Dimensions (Formally Known as Parameters)

In Google Analytics 4 Properties, our events may also be associated several custom dimensions, or details (also known as parameters). This differs from Google Analytics today, where we often associate an Event Category with an Action, Label, and Value. Each event is structured in terms of key-value pairs, where we have an event_name and parameter values associated with a key, or parameter name.

image showing Event Parameters differences between google analytics and app + web

Another way to think about this model is represented below:

GA vs App + Web Event Parameters

Each parameter value in Google Analytics 4 Properties is also assigned a data type such as a string, integer, double, or a float. To properly query for our Google Analytics 4 Properties events in BigQuery, we’ll need to first understand what parameters are associated with each event_name, and which data type of parameter we are working with.

When in BigQuery, we can easily see these event parameters by clicking on the "preview" table pane. In the example below for the user_engagement event, we see that we have several event parameters assigned, including the screen_class, event_origin, screen_id, and engagement_time_msec. You’ll notice that the screen_class and screen_origin parameters are associated with string values, while the screen_id and engagement_time_msec metrics are associated with integers.

These parameter key-value pairs will be unique for each event. In the example below for post_score, we have a few new parameter key-value pairs, such as level, time, and score, all of which are doubles.

It won’t be very efficient to manually screen all of your events to better understand these key-value pairs. Instead, we can run a quick query to gain a comprehensive understanding of your events and parameter keys in Google Analytics 4 Properties. This query will return the event parameter values that we want to select by event.

SELECT
  event_name,
  params.key AS event_param_key,
  CASE
    -- check to see if the key value pairs are populated for each event_name
    WHEN params.value.string_value IS NOT NULL THEN 'value.string_value'
    WHEN params.value.int_value IS NOT NULL THEN 'value.int_value '
    WHEN params.value.double_value IS NOT NULL THEN 'value.double_value'
    WHEN params.value.float_value IS NOT NULL THEN 'value.float_value'
END
  AS event_param_value
FROM
  `firebase-public-project.analytics_153293282.events_*`, -- update to your table
  UNNEST(event_params) AS params -- flatten our event parameters and event names
WHERE
  _table_suffix = '20181003' -- update to your desired start and end date
GROUP BY
  event_name,
  event_param_key,
  event_param_value
ORDER BY
  event_name,
  event_param_key

For the two events we discussed earlier (user_engagement and post_score), the query above produces the following streamlined and readable output:

event_name event_param_key event_param_value
post_score firebase_event_origin value.string_value
post_score firebase_screen_class value.string_value
post_score firebase_screen_id value.double_value
post_score level value.double_value
post_score level_name value.string_value
post_score score value.double_value
post_score time value.double_value
user_engagement engagement_time_msec value.int_value
user_engagement firebase_event_origin value.string_value
user_engagement firebase_screen value.string_value
user_engagement firebase_screen_class value.string_value
user_engagement firebase_screen_id value.int_value

After modifying the query above for your own events table, be sure to keep this output handy for your future queries! You’ll want to refer back to it regularly for accessing information. 

Note: Alternatively, you could also take advantage of the COALESCE() function in BigQuery to retrieve each event parameter value (string, integer, float, etc.) that is not null. One way to do this is shown below, but you may also evaluate each parameter separately. Check out the BigQuery COALESCE() function documentation for more information. 

SELECT
  event_name,
  params.key AS event_param_key,
    COALESCE(CAST(value.string_value AS STRING),
      CAST(value.int_value AS STRING),
      CAST(value.double_value AS STRING),
      CAST(value.float_value AS STRING)) AS event_param_data
  FROM
    `firebase-public-project.analytics_153293282.events_*`,
    -- update to your table
    UNNEST(event_params) AS params -- flatten our event parameters and event names
  WHERE
    _table_suffix = '20181003' -- update to your desired start and end date
  GROUP BY
    event_name,
    event_param_key,
    event_param_data 
  ORDER BY
    event_name,
    event_param_key
Row event_name event_param_key event_param_data
1 post_score firebase_event_origin app+gtm
2 post_score firebase_event_origin app
3 post_score firebase_screen_class game_board
4 post_score firebase_screen_class game_over
5 post_score firebase_screen_class FIRootViewController
6 post_score firebase_screen_id -88947054873039632
7 post_score firebase_screen_id 1.6635744694409628e+18
8 post_score firebase_screen_id 3.2065888709315773e+18
9 post_score firebase_screen_id -1.5231692623724792e+18
10 post_score firebase_screen_id -6.9106535330285855e+18

Sample Query 2: Events & Parameters in Google Analytics 4 Properties

Although understanding a count of events is incredibly useful, you will probably want to access and isolate your event parameters as well for analysis. You can use the following query to UNNEST events by event name and parameter with key-value pairs by user. This user-level granularity is not available in the Analytics interface, and is one of the key benefits of Google Analytics 4 Properties!

Using the event post_score from above, let’s understand attribution by level, time, and score.

We see here that we’ll need both string_value and double_value parameters. In our queries, we UNNEST each event parameter individually so that we can specify the associated value.

For the level_name, we’ll use the string_value parameter. In addition to this column, we specify the event name and the event_parameter key (level_name):

(
  SELECT
    value.string_value
  FROM
    UNNEST(event_params)
  WHERE
    event_name = 'post_score'
    AND key = 'level_name') AS level_name,

To continue building our query, we’ll repeat this process for each key-value pair that we are interested in using subqueries.

SELECT
  PARSE_DATE('%Y%m%d',
    event_date) AS date,
  user_pseudo_id AS user_id,
  event_name,
  traffic_source.source AS source,
  traffic_source.medium AS medium,
  CONCAT(traffic_source.source,' / ',traffic_source.medium) AS source_medium,
  traffic_source.name AS campaign,
  (
  SELECT
    value.string_value
  FROM
    UNNEST(event_params)
  WHERE
    event_name = 'post_score'
    AND key = 'level_name') AS level_name,
  (
  SELECT
    value.double_value
  FROM
    UNNEST(event_params)
  WHERE
    event_name = 'post_score'
    AND key = 'level') AS level,
    (
  SELECT
    value.double_value
  FROM
    UNNEST(event_params)
  WHERE
    event_name = 'post_score'
    AND key = 'time') AS time,
      (
  SELECT
    value.double_value
  FROM
    UNNEST(event_params)
  WHERE
    event_name = 'post_score'
    AND key = 'score') AS score
 FROM
 `firebase-public-project.analytics_153293282.events_*` -- modify to your project
  WHERE
    _table_suffix = '20181003'
  AND event_name = 'post_score' -- filtering on the single event of post_score to avoid NULL values.

Our results will give us a unique row for each level_name, level, time, and score combination by user_id and acquisition metrics.

Sample Query 3: Pageview and Screenview Events

Because almost every interaction in Google Analytics 4 Properties and Firebase is an event, we can also capture and report on our pageview and screenview data by UNNESTING our event parameters. 

For website data, we’ll look for page_view events as seen below.

The following query also makes use of temporary tables using the WITH statement.

WITH
  pages AS (
  SELECT
    PARSE_DATE('%Y%m%d',
      event_date) AS date,
    user_pseudo_id AS user_id,
    event_name,
    (
    SELECT
      value.int_value
    FROM
      UNNEST(event_params)
    WHERE
      event_name = 'page_view'
      AND key = 'ga_session_id') AS session_id,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(event_params)
    WHERE
      event_name = 'page_view' 
      AND key = 'page_location') AS page_path,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(event_params)
    WHERE
      event_name = 'page_view' 
      AND key = 'page_title') AS page_title
  FROM
  `[].analytics_[].events_*` -- modify to your project
  WHERE
    _table_suffix = '20200811'
    AND event_name = 'page_view')
SELECT
  date,
  page_path,
  page_title,
  COUNT(page_path) AS pageviews,
  COUNT(DISTINCT user_id) AS users
FROM
  pages
GROUP BY
  1,
  2,
  3
ORDER BY
date DESC,
pageviews DESC

If we are interested in screenview data (Firebase apps), we’ll modify the above query to include screen_view event parameters, which differ slightly from our page_view parameters:

WITH
  screens AS (
  SELECT
    PARSE_DATE('%Y%m%d',
      event_date) AS date,
    user_pseudo_id AS user_id,
    event_name,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(event_params)
    WHERE
      event_name = 'screen_view'
      AND key = 'firebase_screen') AS firebase_screen,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(event_params)
    WHERE
      event_name = 'screen_view'
      AND key = 'firebase_screen_class') AS firebase_screen_class
 FROM
    `firebase-public-project.analytics_153293282.events_*` -- update to your table
  WHERE
    _table_suffix = '20181003'
     AND event_name = 'screen_view')
SELECT
  date,
  firebase_screen,
  firebase_screen_class,
  COUNT(firebase_screen) AS screenviews,
  COUNT(DISTINCT user_id) AS users
FROM
  screens
GROUP BY
  1,
  2,
  3
ORDER BY
date DESC,
screenviews DESC

Like our page view information, this query will return a list of screens and associated users and screenviews.

User Properties

One important feature of the data model that we haven’t touched on is user properties, which are similar to our user-scoped custom dimensions in Google Analytics today. Like our events, user properties have a name and an associated value. Today, you can set up to 25 user properties at the Google Analytics 4 property level. Once you set a user property, it persists for that user until its value changes or is set to null.

Similar to our event params, we can run a quick query to gain a comprehensive understanding of your user properties and associated keys in Google Analytics 4 Properties. This query will return the user properties values that we want to select by user property name. Note here that the only difference from our previous query is that we are UNNESTING by (user_properties) instead of (event_params).

SELECT
  user_properties.key AS user_properties_key,
  CASE
    -- check to see if the key value pairs are populated for each event_name
    WHEN user_properties.value.string_value IS NOT NULL THEN 'value.string_value'
    WHEN user_properties.value.int_value IS NOT NULL THEN 'value.int_value '
    WHEN user_properties.value.double_value IS NOT NULL THEN 'value.double_value'
    WHEN user_properties.value.float_value IS NOT NULL THEN 'value.float_value'
END
  AS user_properties_value
FROM
  `firebase-public-project.analytics_153293282.events_*`, -- update to your table
  UNNEST(user_properties) AS user_properties -- flatten our user properties 
WHERE
  _table_suffix = '20181003' -- update to your desired start and end date
GROUP BY
  user_properties_key,
  user_properties_value
ORDER BY
  user_properties_key

Sample Query 4: User Properties

To analyze user properties, we will UNNEST by (user_properties) to access our, user_property_key and parameter value. Again, this user-level granularity may not always be available in the Analytics interface, and can be considered a key benefit of linking BigQuery!

In this example, we may be interested in seeing which users play both quickplay and progressive, or maybe just one or the other. We’ll UNNEST each user property by key, and count the associated users.

WITH
  user_properties AS (
  SELECT
    PARSE_DATE('%Y%m%d',
      event_date) AS date,
    user_pseudo_id AS user_id,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(user_properties)
    WHERE
      key = 'plays_quickplay') AS plays_quickplay,
    (
    SELECT
      value.string_value
    FROM
      UNNEST(user_properties)
    WHERE
      key = 'plays_progressive') AS plays_progressive
  FROM
    `firebase-public-project.analytics_153293282.events_*` -- update to your table
  WHERE
    _table_suffix = '20181003') -- update to your desired start and end date
SELECT
  date,
  plays_quickplay,
  plays_progressive,
  COUNT(DISTINCT user_id) AS users
FROM
  user_properties
GROUP BY
  1,
  2,
  3
ORDER BY
  date DESC,
  users DESC

The output of this query shows us that, for our given date range, users are most likely to play neither progressive nor quickplay, nor a combination of both games. 

Google Analytics 4 Properties & Firebase Query Strategies

After walking through the example queries above, you’ve probably noticed a pattern. In Google Analytics 4 Properties and Firebase, we’ll need to isolate our event data and parameters (or user property information) and create several subqueries to handle our key-value pairs.

Overall, however, the amount of query complexity has been significantly reduced from our traditional Google Analytics BigQuery sessions table. This is primarily due to the simplified data model—events and user properties–that’s it!

Available Metrics & Dimensions in BigQuery

Another important detail about all Google Analytics data in BigQuery (Google Analytics 4 Properties and Google Analytics 360 data) is the fact that you will need to manually calculate many of the metrics that you see in the interface—they are not all provided for you!

In Google Analytics 4 Properties, this may mean calculating new users, count of sessions, or engaged sessions, among others. We generally have far fewer metrics that we need to calculate in our Google Analytics 4 Properties to match our reporting interface than in our traditional Google Analytics 360 sessions tables, but we will still often need to reference the export schema to see what is available to us.

For example, new users is a "missing" metric in our BigQuery export schema that we can easily recreate in Google Analytics 4 Properties using the strategies that we defined above. To do so, we’ll isolate our session_start event, which is one of our automatically collected events in Google Analytics 4 Properties. With this event, we can isolate the ga_session_number, and determine if our user is new or returning.

    COUNT(DISTINCT
      CASE
        WHEN ( SELECT Value.int_value FROM UNNEST(event_params) WHERE event_name = 'session_start' AND key = 'ga_session_number') = 1 THEN user_pseudo_id
      ELSE
      NULL
    END
      ) AS new_users

The Sessions Metric and Other Universal Analytics Dilemmas

With the migration to Google Analytics 4 Properties, you may experience organizational anxiety over the changing data model. Many of you will be asked by your business users to recreate some of the dimensions or metrics that we see in our Universal Analytics (Google Analytics 360 data) today that are not part of the Google Analytics 4 Properties data model. Most notably, you may be asked to recreate session-scoped metrics such as sessions and bounce rates, among others. You may also be asked to recreate landing pages or exit pages within these sessions.

Is this possible with our Google Analytics 4 Properties data? The short answer is yes, and in fact, multiple industry experts have already drafted several queries to address these problems. However, before you start to go down the bounce rate rabbit hole, ask yourself if session-scoped metrics are the best metrics and dimensions to represent your changing business.

After some reflection, you may find that the new events data model fits your business profile more accurately. An event and user-based data model gives companies the ability to quickly determine cross-channel marketing impact, uncover customer journeys across platforms, evaluate overall user experience all in one unified package. It also eliminates one more confusion area of digital analytics—scoping concerns!

Ultimately, we’d recommend using the transition to Google Analytics 4 Properties as a time to re-evaluate your data collection and reporting strategy as a whole. If you can shift your thinking to an event-based world, you may be better prepared to take advantage of any new reporting that comes along in the Google Analytics 4 Properties interface.

Gotchas When Working With BigQuery Google Analytics 4 Properties Data

The following list represents some high-level challenges or "gotchas" that we’ve faced when reporting on our Google Analytics 4 Properties data in BigQuery.

Data Differences Between the Interface & BigQuery

If you’ve worked in BigQuery before, you are familiar with the differences between the raw hit-level export data vs. the aggregated reports in Google Analytics. This aggregation is one of the reasons that we generally need to recreate several common metrics in BigQuery. The data model and export schema aren’t actually "missing" data, instead, the Google Analytics reports are pre-aggregating data for us within the interface.

Because Google Analytics reports consist of pre-aggregated data, there are often small differences (~1-2 percent) between our calculations in BigQuery and the Google Analytics interface reports. Most of this still holds true for our Firebase and Google Analytics 4 Properties export, especially relating to users.

Users in BigQuery vs. Google Analytics

One of the most common differences that you will see in BigQuery vs. Google Analytics is the user count. The pre-aggregated reports in Google Analytics rely on a probabilistic cardinality estimation algorithm called HyperLogLog ++. This algorithm approximates our user count in the analytics interface, and generally is within an acceptable range (~1-2 percent).

However, when we calculate our users in BigQuery, we rely on the underlying hit-level data. While our numbers in BigQuery may not exactly match what we see in Google Analytics, our BigQuery calculations are more accurate! Because the aggregated reports are run daily, as you increase your date range you may find that the percentage variance increases exponentially over time.

In summary, Google’s pre-aggregated tables enable Google Analytics to serve us data more quickly, but often take a few steps to recreate in BigQuery. We may also find that with aggregation over time comes less accuracy, so don’t fret if you see differences in BigQuery and Google Analytics over time.

App Data: Your SDK Version Matters

When analyzing Firebase mobile app data, your software development kit version matters! There are two layers of SDK consideration: 1) the version of Firebase Analytics installed within your app combined with 2) the version of the app that your users access on their devices. Both of these SDK considerations will often make a difference in the data that you see in BigQuery.

This is because the Firebase Analytics SDK (and Core SDK) experience numerous releases and enhancements throughout the year. These releases and enhancements generally result in newly available features and data, such as the more recent additions of the session ID or eCommerce objects. If your app is using an outdated version, or if your users are using an outdated app installation (even if your most recent push uses up-to-date analytics), you may see data discrepancies between users.

If a recent release, such as enhanced eCommerce, is of utmost importance to your organization, you can always push a forced upgrade to your app users. However, consider the implications of your user experience before doing so.

eCommerce: There’s a New Array for That!

Speaking of eCommerce, Google has released a new items array for Google Analytics 4 Properties website (Google Tag Manager or gtag) implementations and Firebase app projects. This array-based tracking method is very similar to our existing eCommerce implementation in Universal Analytics, but with a few caveats.

To get the most out of your enhanced eCommerce will require a few queries as well as a visualization tool of your choice (perhaps Data Studio or Tableau, for example).

In BigQuery, we’ll access our eCommerce array values with each associated eCommerce event. For web hits, the eCommerce gtag documentation does a nice job of summarizing the types of events that you may want to collect, including view_item, add_to_cart, and purchase

Note: eCommerce measurement requires you to link your Firebase project to an Analytics account and to have the Android SDK v17.3.0 or iOS v6.20.0 and up in your app.

Sample Query 5: eCommerce in BigQuery

In this sample query, let’s take a look at UNNESTING eCommerce values within a purchase event. To do so, we’ll use the key-value pairs that we’d like to report on, such as transaction_id, and add in our associated transaction metrics from the BigQuery export schema.

For eCommerce, there are two data types that we will primarily use: ecommerce and items. The ecommerce.() dimensions store transaction information, or are records surrounding items like revenue, quantity, and total items. These values are already "flattened" and are directly associated with the purchase event in BigQuery.

Using this information, we can isolate our transactions by date and transaction_id, which will give us a high-level understanding of performance.

SELECT
  event_date AS date,
  (
  SELECT
    value.string_value
  FROM
    UNNEST(event_params)
  WHERE
    key = "transaction_id") AS transaction_id,
  ecommerce.total_item_quantity AS quantity,
  ROUND(ecommerce.purchase_revenue, 2) AS revenue,
  ROUND(ecommerce.shipping_value, 2) AS shipping,
  ROUND(ecommerce.tax_value, 2) AS tax,
  ecommerce.unique_items AS unique_items
FROM
   `[].analytics_[].events_*` -- modify to your project
WHERE
  event_name = "purchase"
ORDER BY
revenue DESC

Sample Query 6: The Items Array in BigQuery

We also have product-level information in our items.() dimensions. These items are also associated with an event in BigQuery.

Let’s take a look at the view_item_list event, which represents an impression of a product/item list on your website. Here, we use our event in our WHERE clause, but are also UNNESTING by the items array:

SELECT
  event_date AS date,
  event_name,
  items.item_id, 
  items.item_name,
  items.item_brand,
  items.item_variant, 
  items.item_category,
  items.item_category2,
  items.item_category3,
  items.item_category4,
  items.item_category5,
  items.price,
  items.quantity, 
  items.affiliation,
  items.location_id
FROM
  `[].analytics_[].events_*`, -- modify to your project
  UNNEST(items) AS items
WHERE
  event_name = "view_item_list"

This query returns a very streamlined table, which shows us a unique row for each event. We could also SUM() these values to count the number of items lists that were viewed for future analysis.

Final eCommerce Observations

Because there are limited (as of Q3 2020) aggregated eCommerce reports within the Google Analytics 4 Properties interface, you will want to be extra cautious with your query building. Essentially, there may not be a pre-fabricated report to validate your query findings against, so be sure to run your new eCommerce implementation in parallel with your Google Analytics 360 properties for peace of mind.

Final Comparisons Between Google Analytics 4 Properties & Universal Analytics (GA360) in BigQuery

We’ve covered a lot of material relating to the new Google Analytics 4 Properties and BigQuery strategies. However, if you are already an avid user of the tool for Google Analytics 360 data, but need a guideline on the differences between the two tables, you may find the following comparison sheet useful as you make the switch to the new data model.

Topic Google Analytics 4 Properties Events Table Google Anlaytics 360 Sessions Table
Cost Free Connection (integration), with associated query & storage costs. Connection included with Google Analytics 360 subscription (integration), with associated query & storage costs.
Integration Process Occurs in the Firebase Console or in the Google Analytics 4 Property Interface. Occurs directly in the Google Analytics Interface under Property Settings.
Table Structure In our GA4 & Firebase tables, each row represents a single event.  In the Universal Analytics tables, each row represents a single session.
Data Studio Connectivity Google Analytics or BigQuery Connector. Google Analytics or BigQuery Connector
Custom Event Reporting Each event_name will be unnested with associated event parameters. Each Event Category, Action, and Label are separate hit-scoped fields.
Custom User Reporting Each user_property will be unnested with associated values. Each User-Scoped custom dimension must be unnested.
eCommerce Reporting The items.() array and eCommerce.() data can be accessed for each event, and unnested. eCommerce data must be accessed through totals.(), hits.(), or product.() nested tables. 

Reference Material & Additional Reading

While we hope this post has been comprehensive, we know that there is so much to learn when considering Google Analytics 4 Properties and BigQuery. The following represents a few great resources to continue your education: