Cleaning Your Data With Google Tag Manager And Google Analytics

September 13, 2018 | Abby Matchett
Cleaning Your Data With Google Tag Manager And Google Analytics

Data quality issues can plague a Google Analytics account, making analysis difficult, time-consuming, or at its worst, leading to incorrect conclusions. There are many ways to affect the data that you collect and use in Google Analytics, and with recent improvements to Google Tag Manager, it’s even easier to have clean, readable data.

For each data challenge, you can typically address the problem either on the “sending” side or on the “reporting” side, with generally the same final result. Which one you choose is often a question of internal team dynamics – who has the technical know-how, the right access, and the time and focus to address these problems.

Why Should We Format & Filter?

Let’s take a step back to talk about why scrubbing, formatting, and filtering our data is so important in the first place.

Reason #1: Data Consistency

Data consistency is one of the most important components to successful, accurate analysis. Consistent capitalization, page URL structure, and data symmetry allow for comparisons to be drawn and business objectives to be measured and examined. Formatting & Filters enables data consistency, and should be one of our closest allies in our quest for analytics accuracy!

Reason #2: Data Relevancy

Making the data make sense to more people is a worthwhile exercise. Where undefined or (not set) might be accurate information within certain Google Analytics reports, it’s not easy to understand why that value might appear. Cleaning these values up and putting in human-readable labels can help provide valuable information for future analysis.

Reason #3: Automation

We love automation. Really, we can’t get enough – it makes life so simple! In the past, addressing data quality might require chaining together variables in Google Tag Manager, complicated filters in Google Analytics, or advanced ETL processes in a third-party visualization. Cleaning and storing information correctly, using the tools available to us, can save us time and improve efficiency.

Reason #4: It Doesn’t Require A Developer

I considered titling reason #4 “Because you can!” — because it’s true! Filtering and formatting ultimately allows us to clean up messy data that would otherwise skew our reports – and we can sometimes complete these steps without touching a single piece of code or enlisting the help of our trusty development team. Hooray for that!

“Sending” Changes vs “Reporting” Changes

Over the years, we’ve written more than a few posts on the benefits and wizardry of Google Analytics view filters. These live at the “reporting” step and are the last chance option to sift through the data sent to your Analytics account, retrieve the pieces you need, and ultimately ensure data quality for analysis and reporting.

Until now, view filters were one of the most automated ways to format data to lowercase values, remove those pesky undefined values, and clean up messy URLs or other data. In the past, most formatting & filtering work was completed within the Google Analytics admin window. Filters also help control what data is included or excluded from a particular GA View, though we’ve also made the case that this too can happen at the “sending” step, and be handled partly from Google Tag Manager. Check out our post: A Better Alternative To Exclude Filters in Google Analytics.

Until recently, cleaning up data in Google Tag Manager was a relatively challenging process – requiring nesting custom JavaScript variables or lookup tables to perform repetitive cleanup functions. Then, along came the Format Values option for User-Defined Variables in Google Tag Manager, which gives us greater control over cleaning our data before sending information to Google Analytics or other tools. Check out Simo Ahava’s post here: #GTMTips: Format Value Option In Google Tag Manager Variables.

It should be noted that good data doesn’t require Google Tag Manager and that on-page changes could help classic JavaScript or GTAG implementation of Google Tag Manager. When possible, using naming conventions and standardization at the page level (e.g. classes, data attributes, or data layer variables, etc.) will help everything that follows. However, this isn’t always an option, especially if relying on user input or anticipating eventual human error.

Fixing the Data First

When possible, fixing the data at its source or as close to the source is often preferred. For us, that might mean using the feature available in Google Tag Manager to clean up data before it ever gets processed by Google Analytics. While this potentially shifts the burden to a more technical point of contact, there are a few scenarios where this is especially useful.

While one of the most common uses for Google Tag Manager is to send data to Google Analytics, let’s not forget about the other places that we send data. Often, we have pieces of information that are sent to Google Analytics as well as third-party tags. These might include product/transaction info that is shared with third-party conversion tags, page or section level information shared with retargeting or recommendation engines, or copies of data sent to other analytics, CMS, or CRM platforms.

Additionally, with Google Analytics, data is collected at the Property level and then flows down into the various Views underneath that property. When cleanup occurs at the end of the collection process, at the View level, there can be issues with consistent usage of filters. New Views will have no filters applied, and it’s up to the individual to remember to add the existing data cleanup filters to the correct views.

These scenarios should lead you to prefer a “sending” side solution, when possible, fixing the data in Google Tag Manager or on the page so that it’s consistent in Google Analytics and across other platforms.

Fixing the Data Last

On the other hand, with the ease of Google Analytics view filters, it’s entirely reasonable to handle many of the cleanup items inside of Google Analytics. This can be especially helpful when you don’t have the access or resources to use Google Tag Manager.

Other scenarios may help sway your decision towards GA as well. Consider scenarios where many different data sources are flowing into a Google Analytics property. Perhaps you have multiple sites, apps, or offline data that is being sent to the same Google Analytics property. In this case, it may be easier to add one filter in GA instead of tracking down the implementation across many sites/technologies.

Events are a common area where you may want to encourage some standardization – lowercasing all the event dimensions with a view filter is an easy and quick change, and works across any event from any source. Compare that to the effort of remembering to lowercase all variables for all events set up through Google Tag Manager.

If you’ve followed best practices in Google Analytics and have a Test View created, it’s also a fairly easy and standard process to test new filters on the test view and letting it run for a period of time before moving that filter to your main reporting view. This testing process can be trickier to handle in Google Tag Manager, or may require a greater level of testing sophistication.

Tools For Cleaning Up Data

With those considerations in mind, let’s talk through the various ways to clean up data, using both Google Analytics and Google Tag Manager.

“Format Values” on Variables in Google Tag Manager

For issues where you need to uppercase or lowercase a variable’s value, this is a great new feature. As you create new user-defined variables in Google Tag Manager, look below for the Format Values option and use the built-in features to standardize the format before using it in Tags. This is great for any text fields, whether you’re pulling them from the data layer, form fields, or elsewhere.

You can also use this feature to cleanup unwanted “undefined” or “not sets” – helping to make sure that missing data doesn’t muddy your reports or worse, block a hit from sending to Google Analytics. Consider a custom dimension for an Author field on a content website. If we’re missing that specific piece of data, consider replacing that with an appropriate replacement like “other” or “Author Not Set.”

Filters in Google Analytics

The Google Analytics filters are great and can cover a number of issues we want to clean up – like lowercasing, uppercasing, but also gives us filters like Search and Replace and Advanced filters, to replace common mispellings, pull information out, and move things around.

Check out our section on Data Consistency for tips on how to handle:

  • Prepend Hostname to Request URI
  • Lowercase Hostname
  • Lowercase Request URI
  • Lowercase Search Term
  • Lowercase Campaign Dimensions
  • Remove Query String
  • Append Slash to Request URI
  • Search and Replace Filter

View Settings in Google Analytics

In addition to the common filters we can add, the View Settings in Google Analytics can also help us with a few common issues. Head here for common challenges with extra query parameters, the default page name, or site search.

Again, with a challenge like query parameters, you may find this will work best in Google Analytics or in Google Tag Manager. We covered this debate as well in an earlier post.

When Should I Use GTM vs GA?

There’s no one-size-fits-all for variable filtering and adjusting, but generally our rule of thumb is to create a strategy, document, and disseminate the strategy among your team, and stick to it! For example, if you decide to use Google Tag Manager for making lowercase variables, make this your official process. Ensure that all users with access to GTM are trained on the tool, and add variable filters to your publishing checklist.