Regular Expressions in Google Analytics

Howdy Nation!

Google Analytics uses regular expressions (regex) to define filters and track goals.  Regular expressions clarify the process of defining filters and tracking goals.  Regular expressions can be used to exclude a range of IP addresses for instance.  I am going to explain the meaning of each symbol in context of regular expressions.

The dot symbol is a wildcard and will match any single character.  For example, “Act., Scene 3” matches “Act 1, Scene 3” or “Act 2, Scene 3”.

The \ symbol is used to escape the special meaning of meta-characters.  For instance, “U\.S\. Holiday” only matches “U.S. Holiday”.  The IP address “193.164.1.1” will not match only this IP address unless the dots are escaped with the \ symbol.  An exact match for this IP address would be “193\.164\.1\.1” because of the use of the \ symbol.

Square brackets are used to match one item in a character set or to define a range of values.  Use the hyphen inside a character set to specify a range of values such as [0-9].  Negate matches using a caret ^ symbol after the opening square bracket.  For example, [^0-9] does not match values zero to nine.

Curled brackets are used to specify the amount of repetition.  For instance, “31 {1,3}” matches the following results: “31”, “311”, and “3111”.  This does not match “3” or “31111”.

The ? symbol matches zero or one of the previous item.  The + symbol matches one or more of the previous item.  The * symbol matches one or more of the previous item.

It is important to remember that the sign .* will match any string of any size!  This symbol is used to match a wildcard of indeterminate length.

The parentheses () symbols identify groups and remember content as an item.  The pipe | symbol means either/or.

Here is an example of how to use these regular expressions.  “(u\.?s\.?|U\.?S\.?) Holiday” will match “U.S. Holiday”, “US Holiday”, “u.s. Holiday”, “us Holiday”, and “U.S Holiday” or “US. Holiday”.

Regular expression anchors are the ^ and $ symbols.  The ^ symbol is the start of a string and the $ symbol is the end of a string.  For example, “^US” matches “US Holiday” but not “Next Monday is a US Holiday”.  And “192\.168\.1\.1$” matches “192.168.1.1” but not “192.168.14” using these anchor symbols.

The symbol \d matches any number same as [0-9].  The \s symbol matches any white space.  The \w symbol matches any letter, number or underscore same as [A-Za-z0-9].  Here is another example of these regular expressions in action.  The regex “\d{1,5}\s\w*” matches “345 Cherrymeadow”. Remember the * symbol will match a word of any length.

Regular expressions are used to create filters, set up goals, track equivalent pages, and filter data within the reporting interface.  The Regex Generator on the Google Analytics Help Center creates ranges for IP Addresses.  For instance, “192.168.1.1 – 192.168.1.24” is equivalent to “^192\.168\.1\.([1-9]|1[0-9]|2[0-4])$” using regular expressions.  Remember even correctly made regex statements could have flaws so it is important to test the statements multiple times and have someone double-check your work.

Comment below and I will entertain questions.

Tags: , ,

Posted in Analytics, Analytics Tools, Google Analytics


Leave a Reply

Your email address will not be published. Required fields are marked *


Powered by WordPress. Designed by Försäkra Online.