The Value Of Google BigQuery And Google Analytics 360
The integration between Google Analytics 360 and BigQuery is perhaps the most empowering feature in all of web analytics. (There, I said it!) Its hit-level data and cloud-based infrastructure give BigQuery analysis capabilities not found in other web analytics platforms, including both free tools and paid. BigQuery can be the link between third-party data and marketing analytics data. It is the facilitator of advanced data science techniques, and is our preferred data source for visualization.
We work with organizations in a variety of industries that are looking for ways to find deeper insight in their data and turn to Google Analytics 360 for a variety of reasons. For many, the link to BigQuery is actually the reason why they have adopted the platform and in some cases they use BigQuery more than the interface itself.
The Google Analytics 360-to-BigQuery integration serves three primary purposes: querying raw data, connecting with other data sources, and exporting data for visualization. Each capability builds off of the last, and each feature further extends all that Google Analytics can do.
Querying Raw Data
BigQuery is a database, hosted in the cloud. With your subscription to Google Analytics 360, your Analytics data is exported, hit by hit, into BigQuery for you to query, just as you would query a SQL database. The data that comes into BigQuery is raw, hit-level data.
By comparison, inside the Google Analytics interface the data you see is session-based and aggregated. That’s fine for simple marketing questions we might have. For example, in Google Analytics we can easily count the number of sessions that came from a mobile device. But if we wanted to count the number of video play events by a particular user, across multiple sessions, that would be much more difficult to answer.
The Google Analytics interface is relatively easy to use and has a number of tools to make it easy to perform on-the-fly analysis. In order to keep the interface as fast as possible, there are certain limitations in the ways you can access your data and how much you can customize the interface.
This is where BigQuery really shines. You’re using the same underlying data as Google Analytics, but you don’t have the same limitations. Let’s look at some of the limitations that BigQuery overcomes.
Sampling, What Sampling?
One of the most noticeable limitations within the Google Analytics interface is “sampling,” which will kick in when you try to run a complicated or customized report, or a large date range. When this happens, Google extrapolates the data you see by counting only some of your data points and modeling the rest. The result is that the data you are looking at is, in some cases, merely an approximation.
For the majority of users and queries within the interface, this translates to a better user experience and more immediate results. Sampling is improved when you move from the free version to Google Analytics 360: it typically kicks in with non-standard queries when there are over 500K sessions at the property level. With GA360, that increases to 100M sessions at the view level.
While there are ways to combat these sampling limitations in the interface, with BigQuery we can circumvent sampling. We have all of the raw data, so we can query it, slice it, or dice it, any which way, and still have 100% real data in our results.
Focusing On Users Instead of Sessions
In the Google Analytics interface goals and goal funnels are session-based: that means if a person takes multiple visits to complete a task, you won’t see behavior from their earlier session in the goal funnel in Google Analytics; BigQuery has the data and the capability to allow you to surface that information. Likewise, if you’d like to see a goal conversion rate by Users instead (say, what percentage of my users filled out this form?), you’d need to use BigQuery.
This also has a big impact for ecommerce-focused companies. If you use the Enhanced Ecommerce reports inside Google Analytics’ interface, you know that those reports are session-based. But in many real-world scenarios, a person might add an item to their cart in one visit and wait to complete the purchase in another visit. BigQuery allows you to see purchasing behavior from users who take more than one session to pull the trigger.
But what about using User segments in Google Analytics? I’m hearing some of you ask. User-based segments in the interface are restricted to a 90-day lookback window; if the interaction you’re looking at occurred before that, you can’t see it at all. BigQuery’s data never goes away. You can create complex segmentation rules, going back in time as far as you like, or even create dynamic segments where one specific behavior is a requirement for another — imagine using variables in segments. You can do that in BigQuery.
Google Analytics’ data model is structured so that session-based dimensions (like source/medium) don’t play well when combined with user-level or page-level dimensions and metrics. And there’s a limit to the number of dimensions we can see side-by-side: two dimensions is usually our limit in the interface, five in custom reports, and the API allows for seven dimensions. With BigQuery, there are no such limitations.
For example – ecommerce customers may have trouble pulling exact stats of the number of users from Social Media that saw a product page and then subsequently purchased the same product. BigQuery users can handle that with a single query.
Going Forward By Going Backwards
Google Analytics has a very specific way that it processes data and your configuration settings, like Goals and Filters. As data is collected from your site or app, GA will apply those settings and store the finished, altered data for you to use in your reports later on. Because of this, it’s not possible to go backwards and change your data in Google Analytics.
However, with BigQuery you can essentially rewrite history! If you made a tracking error in the past, and you’d like to filter out or modify data (page paths, events, entire sessions), you can dynamically adjust your queries in BigQuery to account for those issues.
Goals in particular can cause frustration in the GA interface – you might decide after-the-fact that you wanted to set up a Goal for some key action on your site, or find a mistake in a Goal setup. In the GA interface, goals only work moving forward; with BigQuery, we can write a query to calculate goal completions based on past data. Furthermore, we can add complicated funnels or required actions that can extend across sessions in order for a Goal to be considered complete.
The Fun Stuff
Finally, we can use this raw data and another programming language, like R or Python, to perform data science techniques using the raw data from BigQuery. We can derive predictive insights, for example, using statistical models to forecast how many purchases we might expect to have in the coming month, estimate how many leads might result from a new email blast, or look at projecting return on ad spend.
With machine learning algorithms, based on data science models, the data can speak for itself. Hidden in the wealth of data in Google Analytics are what marketers call “personas”: common user characteristics or user behaviors that, when grouped together, form what data scientists call “clusters” of users. We can employ machine learning algorithms to comb through this data in BigQuery and discover something new — or maybe validate what we already knew.
The final outcome of this machine learning might just be a newfound understanding of our idealized “customer journey”, that elusive most-valuable-path (MVP) from one page to another leading up to a conversion.
If you have data scientists on your team, they can help you leverage BigQuery to perform these exciting data science techniques, or you can lean on us. We're doing just this kind of work every day. Check out our most recent case studies to learn more about these solutions that we provided:
- Case Study: Audience Modeling with Google Analytics 360 and Google Cloud Platform
Bounteous & PBS
- Case Study: Mapping Customer Journey Through Content Attribution with Google Analytics 360
Bounteous & The Leading Hotels of the World
As you explore Google Analytics data in Google BigQuery, check out some of our new Google BigQuery Recipes, already created queries that you can copy-and-paste to immediately begin exploring your data in BigQuery.
Connecting with Other Data Sources
If you’re like most companies, the data from Google Analytics is just a small portion of all the data that you rely on to make digital marketing, content, and usability decisions. You probably have data everywhere: in your CRM, your ecommerce platform, DoubleClick, AdWords, third-party advertising platforms, pixel trackers, Facebook, YouTube, and more. The more tools you use, the more value you are able to create… but only if those tools can talk to one another.
When you have such a large amount of data, there’s always a question about where it should live. BigQuery allows us to combine our Google Analytics data with third-party data sources, in one of two ways: you can either import data into BigQuery, or you can export data out of BigQuery.
Bringing Data Into BigQuery
Since BigQuery is a database, it can actually be your single data source of truth. For example, imagine that you have a CRM. You can configure your website to pass in the Google Analytics client ID (cid) into your CRM when a customer record is created. This identifier is unique to every visitor on your website. Once the client ID is in your CRM, you can export your CRM data into BigQuery. The client ID can be used as the “key” from one data source to the other. The result is that you can see customer information alongside your web analytics and marketing data in BigQuery.
The process for getting data into BigQuery will change based on your needs – how often is the data updated, how large is the data, and what internal processes you already have in place. Google BigQuery natively integrates with Google Drive and Google Sheets, which can make bring in files or sheets a breeze. We can use API’s to programmatically update datasources inside of BigQuery, connecting to other cloud-based systems or internal data warehouses.
This is all so important because BigQuery allows you to import personally-identifiable information (PII) into BigQuery, which is not something that is possible in Google Analytics’ interface. Want to understand how your top customers first found your website? Once you’ve integrated your data sets, you can write a BigQuery query to generate a report of which marketing sources produced which customers, combining data from your CRM with the behavioral and acquisition information from Google Analytics. This will also help you get a better sense of your true “customer lifetime value” (CLV).
Connect BigQuery Data With Your System
Perhaps more common, you can export data from BigQuery to another system. BigQuery has its own API which helps explain this process. With the aforementioned client ID import, we can send data from BigQuery into any data warehouse that will accept a CSV import. For example, we might want to summarize web behavior for each customer in Salesforce. We could easily obtain the original source/medium information from BigQuery and export it to Salesforce. With a little programming work, we might even develop a lead-scoring model for every sales prospect. (We actually do this internally ourselves.)
Beyond just the raw data, bringing in external data can help with one of the most valuable uses of BigQuery, audience generation. Following a detailed data science project, we might identify an audience or “persona” in BigQuery that we’d like to pursue. Connect with external data to either give you more info to help with creating the audiences or to help us take those audiences and use those outside of BigQuery.
In addition to identifying the initial audiences, with BigQuery we can also generate dynamic audiences for remarketing, A/B testing and Google Analytics reporting. There are several ways this is possible.
For example, we might choose to combine our Google Analytics data from BigQuery with email addresses or related emails from a 3rd-party system. This integration would enable us to leverage a feature called “Customer Match” in Google AdWords, allowing us to target matched prospects or existing customers. By uploading a list of emails to AdWords, we can create and target audiences of those users in Google Advertising tools like Search, Gmail & YouTube. Or we might want to send this list of email addresses to a mailing service like MailChimp, or a marketing automation platform like HubSpot or Marketo.
At the same time, we might choose to see these audiences back inside the Google Analytics interface. The easy way to do this would be to approximate the characteristics we are looking for – creating a segment that closely matches, or mimics, the BigQuery audiences that we created. There are more complex ways to do this as well, for example using the measurement protocol to send these client IDs back into Google Analytics and define an audience. Again, there are number of possible options with varying degrees of complexity.
Once you have built or imported an audience inside Google Analytics, you can conduct advanced reporting in the interface, or use the audience for remarketing in a platform like DoubleClick. And with Optimize 360, another tremendously powerful Google Analytics 360 Suite product, you can actually target these audiences for A/B testing on your website.
Exporting Data for Visualization
We love that BigQuery can integrate with many other tools, as already mentioned: MailChimp, HubSpot, Optimize, DoubleClick, AdWords, etc. These tools form a virtuous cycle, that can help you better understand your full customer journey. If the dots are connected, the value of all your data is amplified.
But perhaps the most virtuous cycle of all is the connection between Google Analytics, Google BigQuery and Google Data Studio. There is no better way to visualize analytics data.
If your goal is to create one, and only one, “dashboard of record”, this is it. If you’ve combined your 3rd-party data with BigQuery, as mentioned above, you will only reap greater rewards when all of that data is visualized in one place. You can combine your Google Analytics data from BigQuery, unsampled, raw, hit-level, with your customer-connected PII data from BigQuery, and see it all, dynamically, inside Data Studio.
The complicated, behind-the-scenes magic occurs within BigQuery and connects easily into Data Studio – which allows you to bring in tables from BigQuery or even custom queries. Interested? Check out this great post about BigQuery data in Data Studio.
There are other ways to visualize BigQuery data, of course. Tableau has a native connector to BigQuery, as do many other 3rd-party visualization tools, like Shufflepoint for Excel (a Bounteous favorite).
(Did I mention Google Data Studio is free?)
We know that Google has some exciting new tools on the horizon for the 360 Suite, and we are excited to see the benefits continue to grow as additional integrations and features are announced.
One More Thing…
Because BigQuery is hosted on Google’s extremely fast cloud-based server architecture, Google Cloud Platform (GCP), the queries we run can range across billions of rows of data and come back in seconds. There’s nothing to set up, and no virtual machines to manage.
Oh, and with Google Analytics 360 your usage of BigQuery is free, up to $500/month of usage (which equals 25 terabytes of storage or 100 terabytes of queried data). For reference, even our customers with over one-billion hits per month are in the ballpark of $150/month for storage and $50 to $100/month for querying — well within the free quota.
When an organization starts with Google Analytics 360, we can import the last 13 months of Google Analytics data into BigQuery, so you can start writing advanced queries for year-over-year reports immediately.
BigQuery is an extremely powerful tool, perhaps more important to data analysis and reporting than any other feature in Google Analytics 360. We see BigQuery as an intrinsic part of the platform, and a necessary precursor to any mid- to enterprise-level web analytics strategy.