5 Ways To Get Data On PDFs And Other Downloads

December 9, 2013

One of the notable limitations of Google Analytics (GA) is that it does not provide data on non-html pages out-of-the-box. Thus, if your website has PDF files, Word docs, .wmv files, or other downloads, you’ll face a black hole of data.

But there are ways around that.

Recently, we started a project with a client that had a substantial portion of PDFs on their site. We went through our checklist for SEO for PDFs and determined the following:

  • The PDFs were worth keeping in PDF format
  • The PDFs needed SEO’d, including needing internal links to other pages on their site
  • We lacked data on PDF usage to help our client determine what users were interested in

To the last point, because so many types of content (reports, magazine articles, studies, etc…) were in PDF form, the client really struggled to understand what content performed the best, making content strategizing extremely difficult. So we had to implement workarounds to obtain as much data as possible.

We’ve written about many applicable workarounds in the past, but today I want to get them together in one place for you for easy reference if you want data on your downloadables. So, using our PDF-focused project as an example, below are 5 ways to get data on non-html files.

1. Use Google Webmaster Tools data to examine Google clicks

This method is the easiest of the bunch, and another example of the many ways Google Webmaster Tools (GWT) is incredibly useful for SEO.

Simply login to GWT, then in the left navigation, click Search Traffic -> Search Queries. In the Search Queries Report, click from the Top queries tab to the Top pages tab. Now you can observe the clicks from Google for every page on your site getting organic Google traffic, including PDFs and other downloadable documents. This is different from Google Analytics organic traffic reports which show visits only to html pages that have the GA javascript tracking code in place.

GWT Search Query Report for landing pages
Use GWT Search Query Report to see Google clicks to all landing pages

There are a few ways to get straight to the non-html pages. In the screenshot above, I simply used “cntrl +F” to find anything on the screen containing “.pdf”. You could also export the data into Excel and separate out the non-html pages that way.

An even more user-friendly way to use GWT data is to incorporate it into GA, which is super easy. All it requires is for the admin of both GA and GWT to log into GA and, in the left nav, go to Acquisition -> Search Engine Optimization -> Landing Pages. If you’ve never connected GA and GWT you’ll see a screen that states “This report requires Webmaster Tools to be enabled.” Simply click the set-up button and follow the easy instructions.


Once you connect GWT and GA, you can see a report like this, which will enable you to easily look at types of organic landing pages.

GWT Search Query Report in GA
Easy integration of GWT into GA lets you see Google clicks to all landing pages in GA

One limitation of connecting the accounts is that you can only connect one GWT account to one GA account, and a GWT account can only be for one subdomain. This may be an issue if you have multiple subdomains.

2. Use Google AdWords’ Paid & Organic Report for additional data on Google clicks

Another limitation of GWT data on organic clicks is that it will only ever show 90 days worth of data. This is one reason I like to download GWT data every month.

If you didn’t save GWT data, you may be happy to find that Google Adwords’ Paid & Organic Report goes back further than 90 days. While not as robust a the Search Queries report, this Paid & Organic Report can help you fill in a few blanks. One caveat, however, is that this AdWords report only goes back to the date at which you enabled the report by connecting AdWords and GWT.

Fortunately, enabling this report is also super easy. Learn more about this report at Find More Precise Keyword Data For Organic Clicks To PDFs.

Another limitation of GWT search query data is that the click numbers are always rounded off (in an idiosyncratic way at that). Note that the AdWords’ Paid and Organic Report has no rounding.

3. Use event tracking in GA to see how often docs are downloaded from your site

So far, we’ve shown how to see tracking from Google to your non-html documents. But we assume you also want to know how often the pages are accessed from your site. Well we can do this using event tracking, and we can also see which documents were downloaded, and which html pages users were on when they accessed the documents.

It was when it came time to set up event tracking for every link to a PDF on a site that I truly realized Google Tag Manager was the greatest invention since the coffee machine, as we were able to set-up tracking for hundreds of links in just a few short minutes.

To learn how, go to Google Tag Manager Auto Event Tracking. Note that Jonathan points out exactly how to track PDF link clicks in his section “Example:Tracking Link Clicks.

Tag Manger PDF click rule

Once that’s all set up you can get GA reports like this:

PDF Events in GA
Event tracking links to your non-html pages provides a wealth of data

4. Campaign tag links inside documents to see how they send traffic to html pages

None of the PDFs for our PDF-heavy client had links back to other pages on the site. Albeit a common issue, this lack of internal linking from PDFs is a big site architecture no-no. From an SEO perspective, it meant that no link equity from the PDFs were passed on to the site. Lack of internal links is also detrimental to user-experience and conversion optimization. Thus, we had to edit every PDF so that it links back to the rest of the site.

When we added links to the PDFs, we coded them with campaign tags so we can have data on traffic from PDFs to the rest of the site. This enables us to gain insight on the quality of PDF traffic and the contribution of PDFs to website goals.


5. Track all hits to documents server-side

Using the above-mentioned 4 techniques,you can view data on Google organic clicks, data on how people accesses documents from your site, and data on how people access your html pages from your documents.

However, we still don’t know total “hits” to your documents. We don’t know how often non-Google, external traffic sources are sending visits to your non-html documents.

Alex Moore has a solution to this issue at: Tracking PDFs And Other Downloads Inside Google Analytics. Basically, it is a PHP library that you integrate with Google Analytics using an .htaccess rule.

This solution will enable us to see inside Google Analytics how many times each PDF is downloaded (viewed). Note that only “hits” will be displayed, not visitors, bounce rate, time on page or other metrics. Note also that you’ll need to create a separate Google Analytics property to implement this solution, since it could interfere with data on your existing property. But you’ll finally get easy readings on total traffic in GA.

Just because a page isn’t in html doesn’t mean you can’t get data. With the techniques above, you can learn about:

  • detailed data on how people reach your non-html documents from Google.
  • detailed data on how people access non-html documents from your site.
  • detailed data on visits from non-html documents to your html pages.
  • total hits to non-html documents.