Regular Expressions Part XI: Real Wildcards .*

November 20, 2006

Now we are (I am) ready for a Google Analytics Regular Expression that is truly a wildcard .*

Months ago, I wrote a blog post about Regular Expressions Wildcards for Google Analytics. But when I went back to it, it was only semi-intelligible, so I deleted it and created all the Regular Expression building blocks first. If you like, you can read all ten of them:

… you can read all of them, stretching out over a year:

Backslashes
Dots .
Carats ^
Dollars signs $
Question marks ?
Pipes |
Parentheses ()
Square brackets []and dashes –
Plus signs +
Stars *
Regular Expressions for Google Analytics: Now let’s Practice
Bad Greed
RegEx and Good Greed
Intro to RegEx
{Braces}
Minimal Matching
Lookahead

Now that you (or perhaps more correctly, I) understand the building blocks, let’s talk about how to create real wildcards.

Most of us are familiar with a star as a wildcard, outside of Regular Expressions. We can search for all our .jpg files on our computer with this: *.jpg, which to us means “get everything.jpg.” However, with Regular Expressions, a star only means repeat the last character zero times or once or more than once. In order to make it mean “get everything,” you have to pair it with a dot, like so: .*

Why? Because, a dot means get any character. A star means, repeat the last character zero times or once or more than once. So the combination means, repeat any characters as often as you like, i.e. get everything.

If we wanted to get every occurance of a jpg file, we would do it with a RegEx that looked like this:
.*.jpg

For those of you who are scratching your head instead of nodding your heads, here is why: .* tells Google Analytics to match everything (as described above). The next part of the expression . tells GA to then match a real dot. This is because dots are usually wildcards in their own right, but using a backslash turns them into ordinary dots. The last three characters, jpg, tells GA to match the letters jpg. So we get end up with “everything.jpg,” which was just what we wanted.

Robbin
LunaMetrics

Many thanks to Justin and his awesome RegEx Tool (which doesn’t require a download.) Postscript: And of course, thanks to Steve, who taught me Regular Expressions from the beginning and found an error in this original post.