Keyword Analysis By Number Of Terms (and The RegEx That Helps)

January 2, 2008

Do long search phrases convert better?

This was what I wanted to find out for a particular client, but it took some work. I used a regular expression in the Keywords Report of Google Analytics to filter by the number of terms in the Keyword Phrase. The exported results showed a clear increase in conversion rate as the number of search terms increased.

This client was doing far better with searchers who were using a lot of terms. They were being specific! They knew just what they were looking for and were ready to buy. This data put additional power behind recommendations concerning content, search engine optimization and paid search strategies.

1 .59%

2 .60%

3 .90%

4 1.17%

5 1.06%

6 1.22%

7 1.88%

8 3.33%



Even though there were a lot of people using long search phrases, this data was obscured. As the number of terms increased, the number of people searching for exactly that phrase decreased. This resulted in none of the individual phrases seeming to count for much. The so-called Long Tail.

You really have to dig to find these sorts of gems but they are invaluable in the pursuit of providing information that can be acted upon.

A tool for digging

The tool is a Regular Expression, a pattern matching language. If you’re not already familiar with it, there is a great series of articles right here on the LunaMetrics blog.

Here is what I used:


It accounts for the most common characters I’ve found between words.

Steve (see comments) pointed out a great way to shorten my expression by using the W character set. Here is what it looks like.


W is shorthand for all non-word characters

How do I use it?


I know this may look like gibberish but keep reading — you don’t need to understand it to get some use from it.

In Google Analytics, go to Traffic Sources > Keywords and paste the Regular Expression into the box at the bottom of the data. Just change the {3} to whatever number of terms you want to see and click the GO button.

A brief look at the RegEx

Although this is not strictly a Regular Expression post, I feel obligated to include a basic glance at the different parts of the expression. Feel free to skip this if you just don’t care.

^ anchors the beginning of the match to the beginning of the string

( ) used to group a set of items together for a match

[+*”*s*,*’*-*]* This group matches any number and any order of + ” , – ‘ and whitepace (s). It is what handles all the characters that might end up separating different search terms.

w+ Matches 1 or more alphanumeric characters (the w is another pre-defined set of characters like s)

b Match for a word boundary. It forces the w characters to be separated by something. Otherwise the expression will match any string of characters longer than {3}.

{3} Requires exactly 3 of the above sequence so it would match the phrase one two three but not one two three four

$ anchors the end of the match to the end of the string


Don’t Sweat the Small Stuff

You can’t account for every situation. For example, sometimes ‘ is meant as an apostrophe and sometimes “-” is used as a hyphen. In the end the impact is usually small – just 2-3% of the search phrases were affected in my case and they just get bumped to the next higher match instead. (For example, non-glare window would match at {3} instead of {2})

It is an interesting way to look at keyword data and maybe you’ll get some use from it– if you do, let me know.