Regular Expressions Part XIII: Good Greed

January 11, 2007

regular-expressions

This is my next to last post in this Regular Expression (RegEx) series. I have been thinking about this post for a long time and yesterday someone asked me a question (which finally got me to write this). She wrote that she had two pages that she wanted to roll into one Google Analytics goal. She created the Regular Expression for it, ran it through Epikone’s RegEx Coach, and it worked — but it wasn’t working in GA. (More on the Coach below.)

The two pages were:

subdomain.mysite.com/folder/subfolder/GoalThree.php
subdomain.mysite.com/folder/subfolder/GoalThreesome.php

She sent me a long, complicated expression which wasn’t working for her and asked my opinion.

This is absolutely a case of putting Good Greed to work for you, we will see in a minute. As I wrote in my last post, Regular Expressions are very greedy and they match everything unless you tell them not to. This is a very hard concept to wrap your head around — it means that, among other things, all the stuff before the expression and all the stuff after it gets matched to random things (unless you tell it not to. Or there is nothing to match to.)

Anyway, I wrote her back and said, why don’t you just write an expression like this:

/folder/subfolder/GoalThree

This assumes that she doesn’t have other GoalThreeVersions that will be incorrectly mixed in here. If, for example, she had another page, /folder/subfolder/GoalThreeCornered, that would qualify as a match too (because the RegEx matches everything it can, even if those characters aren’t in the Regular Expression.) Moving back to how simple her RegEx might be, she might even have been able to get away with a goal like this, depending on her site:

/GoalThree

This matches every expression that includes /GoalThree

Finally a word about the Epikone RegEx coach. I haven’t talked to Justin about this. But I am fairly sure that the coach is configured to check whether the phrase you type is a match to the RegEx you type, using the way GA interprets RegEx. That doesn’t mean that you necessarily come up with a valid goal, or an IP address that will actually filter anything. For example, you might use it to see if colou?r is a valid RegEx for color and for colour (it should be), but that doesn’t mean colou?r will necessarily work in your Google Analytics profile filters or goals. You really have to understand the context in which you are using the expression and what GA demands of you in addition to correctly configuring two expressions to match each other.

Backslashes
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes –
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
Intro to RegEx
{Braces}
Minimal Matching

Robbin