Extracting Schema & Metadata With Google Tag Manager

November 24, 2014


If you’re evaluating the performance of your site content, it can help tremendously to segment that content into a variety of cohorts. Unfortunately, many website owners have trouble getting enough information about their content into Google Analytics to help them with their analysis.

Some information may already be available on your website, like information about your page or extra information that gives context to the page.

Ultimately we want to bring these additional dimensions about your content into Google Analytics to help with your analysis. One way to do this is by leveraging Schema and Google Tag Manager.

What’s Schema

If you’re still unaware of Schema, it’s a way of marking up your content so that it is recognized by Google and other search providers. This helps search engines to better understand your content, and hopefully deliver it in a more relevant way to people searching on their systems.

Ultimately, it’s about driving more organic visitors to your website.

Your site may already have Schema in place or you may need to look into adding it. Even if you don’t have some complicated system and developers on hand, there are plugins for sites like WordPress that are easy to install can start delivering valid schema within minutes.

What Does Schema Look Like

Let’s look at a third-party example from a major online publisher who uses schema on their articles. Below is an edited slew of the meta tags found in the head section of code from a single article.

I’ve actually edited out more than half of the ones they use and modified the tags to keep this more readable and anonymize the original source.

Here is the great part: ALL of this information is accessible from within Google Tag Manager!

We can grab any of this and place it in Custom Dimensions for the user, session, or page within Google Analytics. The Schema ones contain the word itemprop however.

<meta itemprop=”alternativeHeadline” content=”Marmite Containers Tracking Use With Smartphones” />
<meta itemprop=”description” content=”New smartphone-connected containers use a bluetooth connection to help consumers track their marmite usage, troubleshoot yeast extract problems, find the best bread to use and more.” />
<meta itemprop=”genre” content=”News” />
<meta itemprop=”identifier” content=”859393038585930″ />
<meta name=”pdate” content=”20141119″ />
<meta name=”utime” content=”20141120074603″ />
<meta name=”ptime” content=”20141119182137″ />
<meta itemprop=”datePublished” content=”2014-11-19″ />
<meta itemprop=”articleSection” content=”Personal Tech” />
<meta name=”author” content=”Joe Sixpack” />
<meta name=”PT” content=”article” />
<meta name=”CG” content=”technology” />
<meta name=”SCG” content=”personaltech” />
<meta name=”des” content=”foodtech” />
<meta name=”keywords” content=”mobile applications, food, yeast extract” />

There is a ton of valuable information in here. If it has “itemprop” in the meta tag that’s an indication it’s schema. It’s always paired with a “content” value. We can look for specific “itemprop” values like “genre” and then grab its “content” value, which in this case would be “News”.

Other meta tags here break down the article id itself (identifier), the publish date, and last time the article was updated, the section of the content it belongs to, the author, and the keywords they’re assigning the article. All of this would be awesome to have in our Google Analytics so we could look at JUST Food Tech content, or Personal Tech content, or just articles written by Joe Sixpack, etc.

Much of this information might originally get thrown onto the webpage for organic search reasons, but there’s no reason we can’t take advantage of that fact.

How To Grab Schema and Metadata in Google Tag Manager

For all the itemprop schema we can simply create and recycle a single macro. Here we’re looking for the publish date (itemprop=”datePublished”) by cycling through the meta tags until we find the one we need. Then, we return its content value.


function() {
var metas = document.getElementsByTagName('meta'); 
      for (i=0; i<metas.length; i++) { 
            if (metas[i].getAttribute("itemprop") == "datePublished") { 
            return metas[i].getAttribute("content"); 

Now that we have this information, we can use this macro in a custom dimension within the pageview for the page. If there is no publish date, the macro will return 'undefined' which doesn't get passed to Google Analytics.

custom dimensions

(Reminder: You need to also set up this new custom dimension INSIDE of the Custom Definitions within Google Analytics. Don't forget that part or you won't record anything!)

We can reference the other meta tags in similar ways. Say we wanted to grab the author name, we'll modify the previous custom JavaScript macro to cycle through the name attributes instead of itemprop. We'll look for a meta tag with a name attribute that is equal to "author" and again spit out its value:


Next thing you know, we're creating a bunch of these macros for any information we have on the page! Once we get them into Google Tag Manager, we can send them into our Google Analytics as Custom Dimensions, or Content Groups, or whatever we want:


Data for Analysis

And we start to get some more in depth data in Google Analytics that we can play with and do analysis on:

Screen Shot 2014-11-21 at 2.55.54 PM