Multi-Session & Multi-Channel Funnel Reporting in Google Analytics BigQuery

October 01, 2019
By Abigail Matchett,
Senior Analytics Consultant

Let’s face it, most of us want more from our acquisition reporting. We preach the value of omnichannel marketing until our throats run dry. We utilize the built-in Multi-Channel Funnel reports within Google Analytics at every relevant opportunity. We open up our organizations to the vast world of conversion path analysis, touting multiple touchpoints, and mediums as the success of our marketing campaigns. 

But what happens when we want to analyze channel performance of micro-conversions, events without associated Goals, or feature performance categorized outside of our predefined goals or paid media efforts? Right now, our default Analytics reports just can’t swing it. Enter Google BigQuery, the knight in shining armor to our multi-session acquisition question.

What Google Analytics Can Do For Us

Before we offer up an alternative solution (and bring BigQuery into the mix), it’s always good to know what we can get out of the box. Remember, free users of Google Analytics won’t have access to BigQuery reporting — but there’s still plenty that you can do! 

Standard Google Analytics Reporting

Google Analytics (GA) provides us with default multi-channel reporting, summarizing top conversion paths, assisted conversions, and path length reports. If you haven’t checked these out, head over to GA and get a move on! 

Multi-Channel Funnel API

If you’ve started to venture outside of the interface (maybe you’ve used the Google Analytics Sheets extension, or dabbled in Google Data Studio), chances are you’re ready to visualize these funnels outside of the interface. If so, it’s your lucky day: the Multi-Channel Funnels API enables you to pull key dimensions into external reports.

If an API sounds difficult, don’t fret: my colleague Sam wrote an awesome post about effectively using the API in Google Sheets & Data Studio

What We Can Do in BigQuery

While BigQuery does not provide us with direct access to the Multi-Channel Funnel (MCF) reports, we can create multi-session and multi-channel acquisition reports using the BigQuery GA Sessions Table. 

It is important to note that we’ll need to apply additional logic in BigQuery to replicate our MCF reports. Said differently, the default fields in BigQuery aren’t formatted initially to be an exact match to the MCF reports. This is because the trafficSource fields found in the BigQuery export differ from the fields in our MCF reporting. In the Multi-Channel Funnels reports, any session that is identified as direct traffic is attributed to the direct channel. In our BigQuery export (and in Google Analytics), all sessions will be attributed to the previous non-direct source.

What does this mean for those of us in our quest for multi-channel funnel reporting? We’ll need to use an IF clause that evaluates if our traffic source is truly a direct acquisition method (more on this in the coming sections). “Why,” you ask? In a nutshell, the source, medium, and campaign fields overwrite any direct traffic and replace this traffic with the most recent source, medium, or campaign (if one is available). For both BigQuery and Google Analytics, all source, medium, and campaign fields follow Last Non-Direct Click attribution rules. By default, campaigns timeout in GA after six months, unless otherwise specified. 

Attribution in GA can be a little tricky, but lucky for us we’ve written some comprehensive posts on the subject.

Defining Our Dimension Requirements

Let’s dive in. For any analysis, it is important to understand the dimensions and metrics that we want to capture before we generate our structured query language (SQL). To understand multi-session acquisition, we’ll need to see at least one traffic source in our requirements document. Typically, this would be the source, medium, or campaign for the session:

    trafficSource.source AS Source,
    trafficSource.medium AS Medium,
    trafficSource.campaign AS Campaign,
    channelGrouping AS Channel_group

You can also concatenate the source/medium information using the syntax below:

trafficSource.source + ' / ' + trafficSource.medium AS source_medium

For multi-channel funnel reporting, we’ll need to add an additional IF clause to evaluate if our traffic source is truly a direct acquisition method. The trafficSource.IsTrueDirect field is a boolean value that returns TRUE if the traffic source is actually a direct source, regardless of GA’s default last non-direct click attribution.

  IF(trafficSource.isTrueDirect, "(direct)", trafficSource.source) AS Source,
  IF(trafficSource.isTrueDirect, "(none)", trafficSource.medium) AS Medium,
  IF(trafficSource.isTrueDirect, "(not set)", trafficSource.campaign) AS Campaign,
  IF(trafficSource.isTrueDirect, "Direct", channelGrouping) AS Channel_Group

The next step is to convert these items into the same format as the Multi-Channel Funnels reporting, creating a string of sequential acquisition methods. We’ll use a STRING_AGG function, which takes our sources, mediums, campaigns, or channel groupings from all of our sessions (within our designated time window) and creates one row of data:

STRING_AGG(channelGrouping, ' > ') AS Channel_funnel,
STRING_AGG(medium, ' > ') AS Medium_funnel,
STRING_AGG(source, ' > ') AS Source_funnel,
STRING_AGG(campaign, ' > ') AS Campaign_funnel

Once we have our session acquisition information, we’ll also want to tie back these dimensions to at least one action that constitutes success on our website (such as a goal or an event). If you’re familiar with the BigQuery schema, you’ll notice that Google Analytics Goals are missing in action. However, BigQuery is extremely flexible, and these configurations that occur in the interface can be recreated.

For example, you may have a Destination goal in Google Analytics for a form submission thank-you page. We can recreate this goal in BigQuery using an IF function, and assign a binary 1 or 0 success metric for analysis:

SELECT
      MAX(
      IF
        (page.pagePath = '/thank-you',
          1,
          0)) 
FROM
        UNNEST(hits)) AS Goal_1_Form_Submissions

Events are also a great way to assess micro or macro conversions. Similar to capturing a destination, we can assign a binary 1 or 0 success metric for completing an event within a session:

SELECT
      MAX(
      IF
        (eventInfo.eventCategory = 'Contact Us'
          AND eventInfo.eventAction = 'Email'
          AND eventInfo.eventLabel = 'info@bounteous.com',
          1,
          0))
    FROM
      UNNEST(hits)) AS Contact_Bounteous_Info_Email

Additionally, if you have an eCommerce website, you may be interested in transactions or revenue data. We can supplement our query conversion data with totals.transactions and totals.transactionRevenue.

A key benefit of using BigQuery for this type of analysis is that we can select multiple events and conversions for each user. This gives us the ability to compare and contrast behavior by channel, and include more diversity into our analysis.

We could stop right here, compile our requirements above in a summary query, and end up with multi-session information for form submissions and contact emails. However, one of the beautiful features of BigQuery is that we can isolate specific users, diving one level further in our conversion success process.

User-Level SQL Requirements

For any user-level analysis, it will be a requirement to group information at the user level. We have two options for user-level analysis in BigQuery: the Client ID or a site-generated User ID. 

The Client ID has been available as a field in BigQuery since May 24, 2018, and is the browser and device cookie that Google sets for each user. To access this in BigQuery, we use the clientId field.

If you have a user ID enabled view (and if BigQuery is linked to this view), you can pull in this value via the userID field in BigQuery:

userId AS Internal_User_Id

If you are linked to a non-user ID enabled view, but you store the user ID field in a custom dimension, we can grab this value by the index:

SELECT
      x.value
    FROM
      UNNEST(hit.customDimensions) AS x
    WHERE
      x.index = 1) AS Internal_User_Id, #where index 1 is your custom dimension index in Google Analytics for the UserId.

Additionally, if we want to specify the order of session acquisition (at the user level) within our MCF reporting, we can utilize Analytic Functions within BigQuery. 

OVER Clause

One of the most popular and useful functions for GA data is the OVER clause, which creates a window that our data can be analyzed across. The OVER clause also determines how the rows are partitioned into smaller sets. Within this clause, we’ll partition by our clientId (or the userID from above if you have it) and the visitNumber. In our query, the visitNumber gives us the incremental session number over the date range that we will specify. For example, if this is the users second visit, the visitNumber will be 2. We use ROWS UNBOUNDED PRECEDING as well to specify that we want to aggregate by all previous rows (previous sessions).

STRING_AGG(channelGrouping, ' > ') OVER (PARTITION BY clientId ORDER BY visitNumber ROWS UNBOUNDED PRECEDING) AS Channel_funnel,
STRING_AGG(medium, ' > ') OVER (PARTITION BY clientId ORDER BY visitNumber ROWS UNBOUNDED PRECEDING) AS Medium_funnel,
STRING_AGG(source, ' > ') OVER (PARTITION BY clientId ORDER BY visitNumber ROWS UNBOUNDED PRECEDING) AS Source_funnel,
STRING_AGG(channelGrouping, ' > ') OVER (PARTITION BY clientId ORDER BY visitNumber ROWS UNBOUNDED PRECEDING) AS Campaign_funnel,

Putting it All Together

Once we’ve identified our dimensions, conversions, and granularity level of the report, we can put the code together and run our query. I’ve recently been a big fan of creating subqueries using a WITH clause to organize our data, so in the example below, you will see three separate steps.

The first part of our query grabs every session, conversion, and acquisition channel within our TABLE_SUFFIX date range. We also pull in any conversions or micro-conversions that we are trying to analyze. We’ve named this subquery session_data to give some context for what we are pulling. 

The subquery path_data applies our aggregation function, grouping acquisition data by our Google Analytics Client ID. And finally, we select all fields from our path_data subquery, specifying users where we have completed at least one of our conversions outlined during the session_data portion of our query.

## PULLS DATA FROM GA_SESSIONS TABLE TO DETERMINE MCF ACQUISITION
WITH
  session_data AS (
  SELECT
    date,
    clientId,
    visitNumber,
    IF(trafficSource.isTrueDirect, "(direct)", trafficSource.source) AS Source,
    IF(trafficSource.isTrueDirect, "(none)", trafficSource.medium) AS Medium,
    IF(trafficSource.isTrueDirect, "(not set)", trafficSource.campaign) AS Campaign,
    IF(trafficSource.isTrueDirect, "Direct", channelGrouping) AS Channel_Group,
    totals.transactions AS Transactions,
    totals.totalTransactionRevenue AS Revenue,
    (
    SELECT
      MAX(
      IF
        (page.pagePath = '/thank-you',
          1,
          0))
    FROM
      UNNEST(hits)) AS Goal_1_Form_Submissions,
    (
    SELECT
      MAX(
      IF
        (eventInfo.eventCategory = 'Contact Us'
          AND eventInfo.eventAction = 'Email'
          AND eventInfo.eventLabel = 'info@bounteous.com',
          1,
          0))
    FROM
      UNNEST(hits)) AS Contact_Bounteous_Info_Email
  FROM
    `your-project.your-table.ga_sessions_*`,
    UNNEST(hits) AS hit
  WHERE
    _TABLE_SUFFIX >= '20190901'
    AND _TABLE_SUFFIX <= '20190905'
  GROUP BY
    date,
    clientId,
    visitNumber,
    Source, 
    Medium,
    Campaign,
    Channel_Group,
    Goal_1_Form_Submissions,
    Contact_Bounteous_Info_Email,
    Transactions,
    Revenue) (
  WITH
    path_data AS (
    SELECT
      clientId AS Google_client_Id,
      STRING_AGG(Channel_Group, ' > ') OVER (PARTITION BY clientId ORDER BY visitNumber ROWS UNBOUNDED PRECEDING) AS Channel_funnel,
      STRING_AGG(Medium, ' > ') OVER (PARTITION BY clientId ORDER BY visitNumber ROWS UNBOUNDED PRECEDING) AS Medium_funnel,
      STRING_AGG(source, ' > ') OVER (PARTITION BY clientId ORDER BY visitNumber ROWS UNBOUNDED PRECEDING) AS Source_funnel,
      STRING_AGG(Campaign, ' > ') OVER (PARTITION BY clientId ORDER BY visitNumber ROWS UNBOUNDED PRECEDING) AS Campaign_funnel,
      SUM(Goal_1_Form_Submissions) AS Goal_1_Form_Submissions,
      SUM(Contact_Bounteous_Info_Email) AS Contact_Bounteous_Info_Email,
      SUM(Transactions) AS Transactions,
      SUM(Revenue/1000000) AS Revenue
    FROM
      session_data AS data
    GROUP BY
      Google_client_Id,
      Channel_Group,
      visitNumber,
      Medium,
      Source,
      Campaign)
  SELECT
    *
  FROM
    path_data
  WHERE
    (path_data.Goal_1_Form_Submissions >= 1
      OR Path_data.Contact_Bounteous_Info_Email >= 1
      OR Path_data.Transactions >= 1))

Let’s Recap

In many cases, we won’t need to abandon the MCF reports that we know and love in Google Analytics. These reports offer us an insightful view into conversion channel performance and help share the omnichannel story. However, there are a few occasions where we might require a little customization in our reporting and trend analysis. By extending our multi-session conversions to events or other non-goal interactions using BigQuery, we can more deeply assess the value of our marketing strategies on a granular level. 

Remember, unless we use isTrueDirect as outlined above, these reports will differ from the MCF API in that BigQuery utilizes Last Non-Direct Click Attribution by default. Depending on your analysis, you may want to keep the Last Non-Direct Click Attribution so that the session funnels match your GA data, so choose your own adventure! Otherwise, if you are hoping to match the MCF reporting, be sure to include the isTrueDirect clause.

Lastly, obtaining acquisition data from BigQuery can help us automate our reporting. Data Studio has a native connector to our BigQuery project, and we can send this data as frequently as we’d like for visualization purposes. We can also employ and utilize BigQuery’s API to send our multi-session attribution data to any visualization tool within our organization (such as Tableau, Power BI, and many more).