Regular Expressions Question And GA: Search/Replace

March 20, 2007

This weekend, someone send me a Google Analytics Regular Expression (RegEx) question. The answer is pretty basic but interesting, and there is something to be learned about one of my favorite tools, the Epikone RegEx Tester.

Q: Hi,I’ve read most of your posts about RegEx, but I still can’t manage to find the right RegEx for one of my filters in GA.

I’d like to use a “search and replace” filter for all the pages whose URLs are either / OR /index.asp (which are in fact: www.my-domain.com and www.my-domain.com/index.asp). Basically, I’d like to have all the pages with both URLs displayed as “the page name I gave” in GA reports. This is why he wants to use the search and replace filter – to give the pages his chosen name. Robbin

I have tried several expressions on the RegEx filter tester but none of those seem to work. Note to Epikone: Notice that your tool is now elevated to “the” tester of choice. Robbin

I tried this one below, but I’m not sure that what the RegEx filter tester tells me means the filter is correct or not (I don’t fully understand how this tool works, especially for the “input string” and “result” fields). Here is the RegEx he is interested in:

^(/|/index.asp)$

When I enter / in the input string, then click submit, the displayed result is Match: /,/

When I enter /index.asp in the input string, then click submit, the displayed result is Match: /index.asp,/index.asp

I don’t know what this result does mean exactly.

Could you tell me if this RegEx (^(/|/index.asp)$) is correct regarding what I’m after, or if it’s wrong and then could you suggest me a working one ?

And here is my answer:

Robbin: Why don’t you first change your default page to be just index.asp. You can do this in settings > edit > then edit again. Telling GA that your default page is index.asp will stop you from getting a page like this / . This will help you with the search and replace AND help you read your analytics more easily.Then you can do it the simpleton’s way: ^/index.asp (You really don’t need the dollar sign unless you have urls that end aspx, for example.)

I think if I were wanting to keep both / and /index.asp (a bad idea), my regex would be ^(/index.asp)|/

It is really the same as yours, just a little simpler and easier to read.

The reason that the Epikone RegEx tester acts the way it does when you write it with parenthesis is that parenthesis tell GA, “I’ve created a variable.” And here, you can read what Justin the Man said about their RegEx tester and creating variables, I found this in old email from him:

Justin’s email: “Why our reg ex tester behaves the way it does. Our tester is pretty smart. If your expression matches the input string, then the tester will return the word ‘Match’ along with the part of the string that the expression matched. Now, if you are using parenthesis to store some part of the expression in a variable, the tool will return the value stored in the variable in addition to the part of the string that the reg ex matches.”

There is at least one other way to do this, too. You could go into the part of the code that reads urchinTracker(), on the homepage and make it urchinTracker(‘homepage’).

In the process of writing this, I found that there is a whole piece on the Epikone blog about how to interpret their results.

Robbin