A Practical Guide To Getting Started With Regular Expressions (with Sample Data)
Regular Expressions: The Gift That
Keeps on Giving( and Giving)*!
When I came to LunaMetrics, I had never really used regular expressions. I had heard about them, knew they were important, but couldn’t give you one concrete use. “Learn regular expressions!” they said, so learn regular expressions is what I did, still unsure of how or why these would be useful. There were examples online, people talking about Advanced Segments or Custom Filters, but how can you begin to understand these concepts until you actually need to use them? It was only after I began taking on clients and working with Google Analytics and Google Tag Manager that I was able to try out my newfound skills and truly become a convert.
Yet still, I couldn’t help but think that there must be a better way to introduce regular expressions (we’ll call them regex from here on) to complete newcomers. There are plenty of resources out there, which I’ll link to. I’m not going to recreate all of the basic instructions, but I’m going to give examples that I would have found useful when beginning my regex journey.
If you want to play along and you have the access and the knowhow, I would recommend starting with Google Analytics and Google Tag Manager on a test site.
What are Regular Expressions?
If you’ve gotten this far, hopefully you at least know why you’re here. In the most basic sense, regular expressions are a way of finding a needle in a haystack. Think of the Find feature in Microsoft Word or an internet browser. Regex is like CTRL+F, but with way more functionality.
When would I need to use Regular Expressions?
This is the question that I had trouble answering before I started using regex. Here is my take on it. You use regex when you want to identify a match. You can use this match to launch a function, fire a custom tag, or include/exclude data. Also, you can use regular expressions to pull out a subsection of text, which you can then use to … launch a function, fire a custom tag, rewrite a URL, etc…
What are some good resources?
Here are some of the resources I found particularly helpful! I found it was useful to have these resources available to me while I was trying to solve a particular problem, like the ones posed below.
- Regular Expressions for Google Analytics eBook (by our very own Robbin Steif)
- Regular Expressions Quick Start
- Regex Crossword Puzzles – a fun way to test yourself
- Regular Expressions Have Many Uses
5 Ways to Practice Using Regular Expressions
The following are 5 different ways to test out regular expressions and see your results in real-time.
1. Online Tool – RegexPal – http://regexpal.com/
This website, like many others, allows you to input sample data and then enter a regular expression, highlighting anything in the sample data that matches.
I would recommend turning on the option “^$ match at line breaks.” I’ve included some sample data below and some test regular expressions you can try to write.
http://www.mysite.com/ http://www.mysite.com/index.php http://www.mysite.com/products/100.php http://www.mysite.com/products/101.php http://www.mysite.com/products/102.php http://www.mysite.com/inquiries/index.html http://www.mysite.com/ourteam/index.php https://www.mysite.com/ https://www.mysite.com/profile http://www.mysite.com/es/index.php http://www.mysite.com/es/producto/100.php http://www.mysite.com/es/producto/101.php http://www.mysite.com/es/producto/102.php http://www.mysite.com/search?q=widget http://www.mysite.com/search?q=widget+thinger http://www.mysite.com/search?q=smidges http://www.mysite.com/index/yy.jpg
All websites – 17 matches
(.*) This will capture any number of any characters
All secure sites – 2 matches
^https.* This will capture any URLS that begin with https
All Spanish language sites in the ES directory – 4 matches
.*/es/.* This will look for anything that contains the string /es/ at some point in the URL. Note that if you just look for ES, you may also get matches where ES is present in the name of the page, like inquiries.php
All index pages – 4 matches
All product pages – English and Spanish
.*/product[os] or */product(o|s) Both of these will match URLs with products or producto in the string.
2. Online Tool – JSFiddle – http://jsfiddle.net
Check out this fiddle that I put together for examples of using regex to determine if the clinked links are links to pdfs or image files, and then use that to determine which alert to show. Also I’ve included examples showing how we can use regex to pull out the name of a product or price from a consistent naming structure.
3. Google Analytics – https://www.google.com/analytics
A lot of documentation I read about regex talked about how they work with Google Analytics and custom filters for Views. This is great, however it doesn’t provide that instantaneous feedback that I was looking for while trying out new concepts. If you have access to a Google Analytics account to a site that has at least a small amount of traffic, I would suggest just pulling up any report and using the filter options to test out your regex expressions.
The filter box on the screen will accept regex, or you can click on advanced and choose to include/exclude based off of your expression.
Site Content – All Pages Report – Pick a few site categories and try to filter them out
All Traffic Report – Supposed you wanted to create a regex that would capture all Sources that come from a social site. For instance, you may begin with something like (facebook|twitter|t.co)
4. Query Explorer – http://ga-dev-tools.appspot.com/explorer/
One of the best things about Google Analytics is the ability to query GA and return the information you need. When you’re using Excel and querying via a site like Shufflepoint, or you have a custom Google Drive and are using a script to query the Google Analytics API, the GA Query Explorer is a great place to test your queries. And guess what: regex works here as well!
Link your GA account and try pulling up a basic report, then just like you did in Google Analytics, start filtering it down.
Here’s a basic query that you can pull for whatever time frame you want:
To use the filter field, you can use simple logical operators like = or !=, or if you want to use regex, you can use =~ for matching your expression or !~ for not matching your expression.
To match a specific page, for instance, you could use something like ga:pagePath==contact.php, or say you have two separate contact pages, you could use ga:pagePath=~contact-(english|spanish).
Try filtering out your homepage from this pageview query, or grouping a set of similar pages into a single query.
5. Google Tag Manager – https://www.google.com/tagmanager/
Just by adding a line of code to your site or your clients’ sites, Google Tag Manager allows you to do all of this via their user interface. In the Fiddle above, we talked about launching alerts based off of the type of link that was clicked. If you have access to Google Tag Manager either on a test site or live site, you can set up this same scenario and test/debug it without ever sending the code live.
To do this – you would first set up a Tag to listen for Link Clicks.
You can set a second Tag as Custom HTML with the code:
Lastly, create a Rule where the Event is equal to gtm.linkClick and then, add a regular expression based off the Macro element url to identify when a link includes PDF or something similar.
Regular Expressions are extremely useful and powerful once you understand when they may be useful. Using the tools available online, you can begin to experiment using regex to match text or extract subsections from text, and then to use that information to perform an action. Practicing with Google Analytics, Query Explorer, and Google Tag Manager will prepare you when you encounter odd scenarios in your own data or with specific clients.