Regular Expressions For GA Bonus 1: {Braces}

June 25, 2007
By Robbin Steif

regular-expressions

Before I start – The “Criticize GA Documentation” contest ends on Tuesday, June 26.

We now take a break in our regularly-scheduled programming (which was filters for GA). That’s because I need to return to an old topic, Regular Expressons (RegEx) for GA, and add a much-needed post: Regular Expression Braces.

Braces are curly brackets, like this {these are braces}. GA never mentions them. So, I don’t know if they are an unsupported feature, or a problem with the documentation.

Braces repeat the last “piece” of information a specific number of times. They are used with two numbers, like this: {6,8}. That particular example means, repeat the last piece of information at least six times and no more than eight.

For example, there is a place across the street here in Honolulu called the Rainbow Bazaar. If I wanted to pull a report with all the correct spellings of their name, I could search the report (in the little box at the bottom of the page in the new GA version). I would use the following RegEx:

baza{2,2}r

This means, pull all the keywords that have a baz followed by at least two and no more than two a’s and which are also followed by an r. Hence, bazaar. (Notice that the last letter is my last piece of information.) Or I could use those same braces to pull misspellings, a more interesting report.

The problem with regular expressions is always knowing what they are “working on.” In this case, what is the last piece of information? A set of square brackets or parentheses would make a piece of information. (And in fact, a great use of braces would be to capture all the IP addresses in a block of 0-255, like this: [0-9]{1-3} . It’s true that you will also capture 538 and 627 and all sorts of numbers above 255, but you really don’t care, since the IP block will never go higher than 255, anyway.) In the absence of a well-defined piece of information (defined by parentheses or brackets), you are working with the last character.

Here are all the other RegEx posts:

Backslashes \
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes –
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now we will Practice
Bad Greed
RegEx and Good Greed
Intro to RegEx
Minimal Matching
Lookahead

Robbin