Making URLs Better Through Content Grouping In Google Analytics

February 27, 2014
By Jonathan Weber,
Director of Data Platforms

URLs are often one of the most problematic labels for data in web analytics: they’re messy, full of inconsistency, gunked up with a bunch of query parameters that may or may not be useful to you. It tends to make analyzing your content a mess.

Here, sort this stack of needles.

Here, sort this stack of needles.

There are a number of suggestions for cleaning up those URLs

(here’s a blog post that’s an oldie but a goodie on cleaning up URLs, written in 2010 but still useful). Or, if you can alter the code on your site, you can send any data you want, which is sometimes used to rewrite or clean up the URL mess. That’s gotten much easier with Google Tag Manager, but there are still lots of situations in which you can’t change the code (lack of expertise in your organization, political fights with the IT department, etc.).

Content Groupings are a new feature in Google Analytics that allow you to classify your pages into groups or categories. My colleague Alex recently wrote a great article about how to classify posts on a blog with groupings like day of the week, number of images, and so on using Content Grouping with Google Tag Manager. Let’s take another look at what other applications Content Grouping can be good for — no code or GTM required.

Finding the Hidden Information in URLs

A lot of times there’s information contained in your URLs that is important, but not necessarily easy to sort by in analytics. If you have a nice, clear directory structure on your site, the Content Drilldown report can be great:

/services/google-analytics/
/services/seo/
/services/pay-per-click/
/blog/
/about-us/

The Content Drilldown report lets you roll those up into folders. Great!

Screen Shot 2014-02-26 at 10.26.54 AM

But, wait! you say. My URLs have information in them, but not in nice little directory structures like that. Maybe they look like this:

/product_widget.php?type=neutrino&flavor=tau
/product_widget.php?type=quark&flavor=strange
/product_transmogrifier.php?model=cardboard

So, notice that these URLs do indeed have useful information in their structure, but now the Content Drilldown report is useless, because it just looks for folders (separated by a slash).

Screen Shot 2014-02-26 at 1.34.21 PM

So what if I want to answer easy questions like…

  • How many pageviews were there to any widget?
  • What about just neutrino widgets?
  • Which model of transmogrifier was more popular?

You can bend over backwards sorting and filtering these URLs in the All Pages report to answer those questions, but you don’t have to. The new Content Grouping features, in addition to letting you group pages by code as mentioned above, also allow you to group pages by patterns extracted from the URLs (using regular expressions).

Content Grouping by Extraction

Screen Shot 2014-02-26 at 11.04.53 AM

You can create up to five different Content Grouping dimensions in each view. Each grouping can have any number of groups within it. You want to map this out and figure out what you want to know before you get started. For our example URLs above, we might want the following groupings:

  • Product Category: widget, transmogrifier, etc.
  • Product Type: neutrino, quark, electron, etc.
  • Product Flavor: tau, strange, charm, top, etc.
  • Product Model: cardboard, gun, etc.

These pieces of information are all present in the URLs, either as part of the path or a query parameter. We can pull out this information with Content Groupings.

To create a Content Grouping, go to the Admin section of Google Analytics. (You’ll need to be an administrator to make these changes.) You’ll notice there’s an option called Content Grouping under the settings for your view (pictured at right). Click on the Create New Content Grouping button to get started.

First, you have to give each grouping a name. Let’s start with the first example above and call this one Product Category.

Screen Shot 2014-02-26 at 11.31.18 AM

Next, you’ll notice that there are three ways to assign groups: by tracking code (detailed in the previously referenced article), by extraction (what we’ll talk about here), or by rules (basically manually building a set of criteria, like a filter, and giving each set of rules a name).

Extraction Regular Expressions

We’re going to use regular expressions to extract the piece of the URL we want to use as the names of the groups. (If you’re not already familiar with regular expressions, now is a great time to familiarize yourself with them.) You extract part of the URL by using a regular expression enclosed in parentheses, like this:

/product_(.*?).php

Our URLs, remember, are /product_widget.php or /product_transmogrifier.php. The regular expression .*? matches any character (the dot) any number of times (the star), and the question mark makes it “lazy”, meaning that it will end at the first possible location. (The “lazy” bit isn’t vital in this example, but it often is, and it’s a good idea in general in defining Content Groupings. Like anything else in GA, test it out and make sure you get it right in a test view first.) It’s OK that other stuff comes at the end of these URL strings; we only have to be as specific as we need to get the part we want to extract.

Screen Shot 2014-02-26 at 11.30.47 AM

So, what this regular expression matches is the part of the URL between the underscore and the .php: the word widget or transmogrifier or whatever else happens to be there. For this one, that’s it, and we can save it.

Likewise, to round out all the examples (create an additional grouping for each — remember you get up to five groupings total in each view):

  • Product Category: /product_(.*?).php
  • Product Type: type=([^&]+)
  • Product Flavor: flavor=([^&]+)
  • Product Model: model=([^&]+)

You can really extract any piece of a URL that has a salient piece of information for grouping your pages.

Using the Content Groupings

Now, the unfortunate part of this story, like pretty much all of the things you change under the hood in Google Analytics: it’s not retroactive. These groups are only applied to new data that comes in for your site, not to historic data.

Screen Shot 2014-02-26 at 12.03.21 PM

Once you have that data, though, it’s easy to apply these groups to your URLs. You’ll find the Content Grouping drop-down in the All Pages report (or add a grouping to custom reports as a dimension).

Now I can easily see metrics report by (in this case) Flavor. You can always drill down within a group to see in the individual URLs, but the groups make it easy to roll them up in the way you’ve defined.

Screen Shot 2014-02-26 at 1.42.32 PM

So, Content Groupings are one more element in your arsenal in the war on bad URLs in your data. I hope they help!