简体   繁体   中英

Regex for Google Analytics Goals

I've searched all the other Regex on Google Analytics questions but I can't use the answers as this is pretty specific to my problem.

I want to set a goal but use Regex to flag it as a goal IF string includes

/client-thank-you/ AND anything EXCEPT hire

so in other words

/client-thank-you/hire is not correct

/client-thank-you/anything/else is correct

Each of the following regexes will match any string that contains /client-thank-you/ and does not contain hire , depending on what assumption(s) you make about where "hire" is in the string.


Solution

Where can "hire" be located in the string?

Anywhere:

((?!hire).)*?/client-thank-you/((?!hire).)*

Only following the "/client-thank-you/":

.*?/client-thank-you/((?!hire).)*

Only immediately following the "/client-thank-you/":

.*?/client-thank-you/(?!hire).*

Notes

Optimization:

Each of these regexes will match the entire string. If your tool lets you determine if a string contains a substring match (rather than naively attempting to match the entire string), then you could optimize the second and third regexes by removing the leading .*? . Likewise, the third regex could be further optimized by removing the trailing .* as well.

Positively require "anything":

Note that all of these regexes assume that a string that ends with "/client-thank-you/" (with nothing after it) is valid. If this assumption is incorrect (ie the string .*/client-thank-you/$ is not a match), then change the trailing * on every regex to + . This would also mean that you have to keep the last .* on the third regex as a .+ (ie don't optimize that away).



EDIT:

The above will not work since GA uses a very limited version of regex (that does not include lookaround). If there is no other GA tool (other than a single regex) that you can use that meets your needs, then you could use the following as a last-ditch effort:

([-._~!$&'()*+,;=:@/0-9A-Za-gi-z]|h[-._~!$&'()*+,;=:@/0-9A-Za-hj-z]|hi[-._~!$&'()*+,;=:@/0-9A-Za-qs-z]|hir[-._~!$&'()*+,;=:@/0-9A-Za-df-z]|.{1,3}$)

And in expanded form for illustration purposes only:

(                                |                                 |                                  |                                   |       )
 [-._~!$&'()*+,;=:@/0-9A-Za-gi-z] h[-._~!$&'()*+,;=:@/0-9A-Za-hj-z] hi[-._~!$&'()*+,;=:@/0-9A-Za-qs-z] hir[-._~!$&'()*+,;=:@/0-9A-Za-df-z] .{1,3}$

This regex will match 1-4 characters that do not form "hire". It does so by matching the minimum number of characters necessary to verify that the match is neither "hire" nor can serve as a prefix of "hire". It takes into account end-of-line (eg "hir" is valid if there is nothing else after it). The characters that it matches are all valid characters that can occur in the path component of a URL as specified in RFC 3986 .

You use this regex by substituting it for every ((?!hire).) in any of the solutions given above. For example:

.*?/client-thank-you/([-._~!$&'()*+,;=:@/0-9A-Za-gi-z]|h[-._~!$&'()*+,;=:@/0-9A-Za-hj-z]|hi[-._~!$&'()*+,;=:@/0-9A-Za-qs-z]|hir[-._~!$&'()*+,;=:@/0-9A-Za-df-z]|.{1,3}$).*

This matches any url that contains "/client-thank-you/" but not "/client-thank-you/hire".

Do be careful, though. Doubled "h"s will make this workaround fail (eg "hhire"). However, if "hire" will only ever follow a path delimiter (ie /hire/ ), then that shouldn't be a problem.

If you can't use a lookahead like Travis suggested, then I suggest setting the goal to fire on an event instead of a pageview.

If you're using Google Tag Manager, you'll have the ability to write a more advanced regex, or at least set a blocking rule for the event that prevents it from firing when 'hire' is in the page URL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM