Eliminating Dumb Ghost Referral Traffic In Google Analytics

March 19, 2015

Since I wrote last August about Bot and Spider filtering there have been additional posts written about this topic such as:

http://viget.com/advance/removing-referral-spam-from-google-analytics

http://www.analyticsedge.com/2014/12/removing-referral-spam-google-analytics/

Today, I’ll be talking about the newest kid on the block, “Ghost Referral” Traffic, and how to block it from your Google Analytics.

What is Ghost Referral Traffic?

This is traffic that never actually visits your website, thanks to the Google Analytics Measurement Protocol. They’re not bots or spiders which you can generally block on your site, because they never hit your site.

They often leave a trail in your hostname or referral reports to entice you to click through to their spam-ridden websites.

Filter it out

Several others have suggested blocking this traffic by creating an include filter on Hostname (basically your own website domain) because the current Ghost Referrals are coming from other hosts. If you include ONLY your site hostname, it would prevent some other hostname’s data from showing up in your reports. Here’s the problem though… You can set the hostname in the measurement protocol.

If a spammy spammer is intentionally sending you bad traffic, they can intentionally overwrite the hostname, and suddenly your filters including only your hostname aren’t helping you. The Ghost Referral traffic becomes all but indistinguishable from your normal site traffic and those filters do nothing.

Here’s how you can really eliminate (most of) it

I’d like to suggest a more surefire way to block out traffic from outside of your website with a combination of tracking changes and filters that will work with Universal Analytics.

Step One – Set a cookie

The first thing you’re going to want to do is set a cookie on your website. You can do this manually on your site or through something like Google Tag Manager, it’s just imperative that anyone who reaches your website gets the cookie.

Let’s give it a nondescript name like “dev-status” with a value of “march2015”, and make the expiration date well into the future. Whenever someone hits any page on your site you can automatically update this cookie with “dev-status=march2015” and extend its lifespan.

Step 2 – Create a Custom Dimension

Within Google Analytics, be sure to set up the Custom Dimension at the Property level. You’ll want to set this at the User Scope just to be safe. Make sure you write down the index number.

Step 3- Grab the cookie value

We’ll be implementing this solution through Google Tag Manager, so we’ll create a new Variable/Macro for a 1st Party Cookie with the name of “dev-status”.

dev-status-cookie

Step 4 – Pass in the cookie value

We can now reference {{dev-status}} in Google Tag Manager and we’ll get the value of “march2015”. Take this Variable/Macro and place it in a Custom Dimension within your Google Analytics pageview Tag.

customdimension-devstatus

Step 5 – Filter out the bad traffic

Back in Google Analytics, create a new filter and include only traffic where your specific Custom Dimension is set to your specific value.

filter-devstatus

The Result

Now even if the Ghost Referral is spoofing your hostname (which is relatively easy to do without human intervention), it isn’t hitting your site and passing that relatively innocuous Custom Dimension value of “march2015” so it gets filtered out.

It just makes sense – if someone visits our website, they get counted in your Google Analytics. If someone doesn’t visit our site, they don’t get the cookie, so they don’t get counted.

Keep in mind – if you’re actually passing in offline hits on purpose, you can also pass in this secret key to include that traffic as well.

What about the “dumb” part of your title?

This only works if the Ghost Referral doesn’t pass that Custom Dimension value. If they come to your website, and scan what you’re sending as far as hits, and mimic the Custom Dimension value, then the hit will pass through even with the above method, because the fake hit will also have the custom dimension value.

That’s why I’m calling it “Dumb Ghost Referral” Traffic. It’s traffic, not from your site, which isn’t even looking at your site or trying too hard. “Smart Ghost Referrals” that mimic even your custom dimensions will be harder for us to detect, because they’ve been smart enough to scan your site and try and mimic other aspects than a general page hit.

We do have a method to do that as well… But that’s for another blog post.