A Practical Guide To Getting Started With Regular Expressions (with Sample Data)

December 26, 2013 | Jon Meck

Regular Expressions: The Gift That Keeps on Giving( and Giving)*!

 

When I came to Bounteous, I had never really used regular expressions. I had heard about them, knew they were important, but couldn’t give you one concrete use. “Learn regular expressions!” they said, so learn regular expressions is what I did, still unsure of how or why these would be useful. There were examples online, people talking about Advanced Segments or Custom Filters, but how can you begin to understand these concepts until you actually need to use them? It was only after I began taking on clients and working with Google Analytics and Google Tag Manager that I was able to try out my newfound skills and truly become a convert.

Yet still, I couldn’t help but think that there must be a better way to introduce regular expressions (we’ll call them regex from here on) to complete newcomers. There are plenty of resources out there, which I’ll link to. I’m not going to recreate all of the basic instructions, but I’m going to give examples that I would have found useful when beginning my regex journey.

If you want to play along and you have the access and the know-how, I would recommend starting with Google Analytics and Google Tag Manager on a test site.

What are Regular Expressions?

If you’ve gotten this far, hopefully, you at least know why you’re here. In the most basic sense, regular expressions are a way of finding a needle in a haystack. Think of the Find feature in Microsoft Word or an internet browser. Regex is like CTRL+F but with way more functionality.

When would I need to use Regular Expressions?

This is the question that I had trouble answering before I started using regex. Here is my take on it. You use regex when you want to identify a match. You can use this match to launch a function, fire a custom tag, or include/exclude data. Also, you can use regular expressions to pull out a subsection of text, which you can then use to … launch a function, fire a custom tag, rewrite a URL, etc…

What are some good resources?

Here are some of the resources I found particularly helpful! I found it was useful to have these resources available to me while I was trying to solve a particular problem, like the ones posed below.

5 Ways to Practice Using Regular Expressions

The following are 5 different ways to test out regular expressions and see your results in real-time.

1. Online Tool  – RegexPal – http://regexpal.com/

This website, like many others, allows you to input sample data and then enter a regular expression, highlighting anything in the sample data that matches.

I would recommend turning on the option “^$ match at line breaks.” I’ve included some sample data below and some test regular expressions you can try to write.

Test Data
http://www.mysite.com/
http://www.mysite.com/index.php
http://www.mysite.com/products/100.php
http://www.mysite.com/products/101.php
http://www.mysite.com/products/102.php
http://www.mysite.com/inquiries/index.html
http://www.mysite.com/ourteam/index.php
https://www.mysite.com/
https://www.mysite.com/profile
http://www.mysite.com/es/index.php
http://www.mysite.com/es/producto/100.php
http://www.mysite.com/es/producto/101.php
http://www.mysite.com/es/producto/102.php
http://www.mysite.com/search?q=widget
http://www.mysite.com/search?q=widget+thinger
http://www.mysite.com/search?q=smidges
http://www.mysite.com/index/yy.jpg
Test Searches

Here are some examples to test yourself. Click here to show/hide my recommended expressions. There are many different ways to accomplish the same goals, but these will help to get you started!

All websites – 17 matches

All secure sites – 2 matches

All Spanish language sites in the ES directory – 4 matches

All index pages – 4 matches

All product pages – English and Spanish

2. Online Tool – JSFiddle – http://jsfiddle.net

This is a great website to experiment with using javascript and javascript libraries like jQuery. I would recommend this site to anyone testing anything really, but it’s a great way to see how to use regex to determine whether or not to launch a function, or to decide which output to return. I’ve included jQuery, which allows us to pull out several features of items that are clicked.

Note: Using jQuery $(this) in javascript is comparable to using Google Tag Manager’s element inside of Macros.

Check out this fiddle that I put together for examples of using regex to determine if the clinked links are links to pdf or image files, and then use that to determine which alert to show. Also, I’ve included examples showing how we can use regex to pull out the name of a product or price from a consistent naming structure.

3. Google Analytics – https://www.google.com/analytics

A lot of documentation I read about regex talked about how they work with Google Analytics and custom filters for Views. This is great, however, it doesn’t provide that instantaneous feedback that I was looking for while trying out new concepts. If you have access to a Google Analytics account to a site that has at least a small amount of traffic, I would suggest just pulling up any report and using the filter options to test out your regex expressions.

The filter box on the screen will accept regex, or you can click on advanced and choose to include/exclude based on your expression.

Site Content – All Pages Report – Pick a few site categories and try to filter them out
All Traffic Report – Supposed you wanted to create a regex that would capture all Sources that come from a social site. For instance, you may begin with something like (facebook|twitter|t.co)

4. Query Explorer – http://ga-dev-tools.appspot.com/explorer/

One of the best things about Google Analytics is the ability to query GA and return the information you need. When you’re using Excel and querying via a site like Shufflepoint, or you have a custom Google Drive and are using a script to query the Google Analytics API, the GA Query Explorer is a great place to test your queries. And guess what: regex works here as well!

Link your GA account and try pulling up a basic report, then just like you did in Google Analytics, start filtering it down.

Here’s a basic query that you can pull for whatever time frame you want:

Dimensions: ga:pagePath
Metrics: ga:pageViews
Order: -ga:pageViews

To use the filter field, you can use simple logical operators like = or !=, or if you want to use regex, you can use =~ for matching your expression or !~ for not matching your expression.

To match a specific page, for instance, you could use something like ga:pagePath==contact.php, or say you have two separate contact pages, you could use ga:pagePath=~contact-(english|spanish).

Try filtering out your homepage from this pageview query, or grouping a set of similar pages into a single query.

5. Google Tag Manager – https://www.google.com/tagmanager/

Here is where regex becomes incredibly useful. With Google Analytics and querying GA, it’s all about analyzing data you have or segmenting your data. With Google Tag Manager, like the JavaScript example, you can use Regex to create specific Rules that can be used as Triggers to launch events, add specific tracking, or populate custom dimensions or metrics.

Just by adding a line of code to your site or your clients’ sites, Google Tag Manager allows you to do all of this via their user interface. In the Fiddle above, we talked about launching alerts based on the type of link that was clicked. If you have access to Google Tag Manager either on a test site or live site, you can set up this same scenario and test/debug it without ever sending the code live.

To do this – you would first set up a Tag to listen for Link Clicks.

You can set a second Tag as Custom HTML with the code:

Lastly, create a Rule where the Event is equal to gtm.linkClick and then, add a regular expression based on the Macro element URL to identify when a link includes PDF or something similar.

Conclusion

Regular Expressions are extremely useful and powerful once you understand when they may be useful.  Using the tools available online, you can begin to experiment using regex to match text or extract subsections from text, and then to use that information to perform an action. Practicing with Google Analytics, Query Explorer, and Google Tag Manager will prepare you when you encounter odd scenarios in your own data or with specific clients.