GA: Why Do Pages Refer To Themselves?

July 29, 2008

Content - Navigation


Content - Navigation

About a week ago, I read a post by Avinash that answered GA questions; but when I got to the part about the navigation report (see screen shot, left), I just didn’t agree. The question was, “Navigation summary question – why is previous and next page often the same as the page you are viewing? ” Like this report on the left: Notice that 6.23% of pages that lead to the index page come from the index page, and 6.23% of pages that come from the index page go to itself. A little strange, no?

Why I was suspicious of the original answer.

In his post, Avinash wrote that someone at GA explained what caused this peculiar beharior. Here is how he described it — basically, it is about viewers that look at a regular tagged page and then look at a picture on the page in larger format (which isn’t tagged). Here is the example he gives:

Visitor Action One (view): /avinash/2007/09/rethink-web-analytics-introducing-web-analytics-20.html
Result: javascript hit generated (data collected)

Visitor Action Two (click): http://www.kaushik.net/avinash/wp-content/uploads/2007/09/web_analytics_1.0.png
Result: NO javascript hit generated (no data collected)

Visitor Action Three (back): /avinash/2007/09/rethink-web-analytics-introducing-web-analytics-20.html
Result: javascript hit generated

Visitor Action Four (click): http://www.kaushik.net/avinash/wp-content/uploads/2007/09/web_analytics_2.0.png
Result: NO javascript hit generated

Visitor Action Five (back): /avinash/2007/09/rethink-web-analytics-introducing-web-analytics-20.html
Result: javascript hit generated

To Google Analytics (or any other Analytics tool), it will look like this:

1) /avinash/2007/09/rethink-web-analytics-introducing-web-analytics-20.html – javascript hit generated

2) /avinash/2007/09/rethink- web-analytics-introducing-web-analytics-20.html– javascript hit generated

3) /avinash/2007/09/rethink-web-analytics-introducing-web-analytics-20.html – javascript hit generated

</Avinash>

This sounded plausible, but too neat. Much too neat for me. What if someone got to one of those pictures – one of those untagged .png pages – and decided to leave the site altogether? If just a single person bailed out, that would make the percentages different. In order for this explanation to work, every single person would have to exhibit the identical behavior – they would all have to look at two pictures and come back to the same page. It has to be perfectly symmetrical, and it is in the hands of thousands of humans to do it the same way.

Do you believe that? I didn’t. But I didn’t know the answer.

The Truth According to John (aka Google Analytics Gang Signing)

So yesterday, I was working with John and Jonathan here at LunaMetrics. “Did you see Avinash’s post a week ago?” I asked them, “Those numbers are WAY too clean. How could a page refer to itself and then refer to itself again every single time?”

John thought to himself for a couple of minutes and then said, “Oh, I get it. Here is what happens. Whenever the page is viewed twice in a row – like a page reload — the whole thing automatically works.” He put his hands together in the configuration on the left. Jonathan nodded wisely. I looked at them like they were nuts.

But ultimately, I understood what he meant:

If a page precedes itself, it also follows itself. That’s what John meant with his fingers — on one side of the report, we see a page preceding itself, on the other side of the report, we see the page following itself. It is just the same story, told twice.

The key is, you can’t think of that report like a clickstream when it involves the same page more than once. Once you stop thinking about it that way, it becomes intelligible. The page is the same no matter which of the columns of the report it appears in, and the numbers have to match exactly because of that.

Still lost? I know that some of you are sitting there nodding your heads, while others are saying, “What is she talking about?” So for the latter crowd, let me describe it in a different way. I hope you won’t mind if I use numbers instead of percentages, just to make this clearer.

Let’s say that Page A refers to itself via a page reload 100 times. And let’s say that the website has only one page — Page A. The report would look like this — in a conceptual way:

Notice how we get 200 pageviews in the middle of the page (and we know that that’s how many there are.) Notice how the number of pageviews on the left and on the right are symmetrical. And notice how these are two identical pictures, which meet in the middle — just like the picture of John’s hands above.

So I think I have run out of ways to explain this problem. It is sometimes caused by a reload, and sometimes caused by part of the explanation that Avinash gave. But it never requires thousands of people to exhibit the identical behavior.

And in closing, John wanted me to show off that he is really known for his good looks and not for his gang signs, so here is he is.

Robbin