简体   繁体   中英

Two greedy quantifiers in the same regex

If I have an unknown string of the structure:

"stuff I don't care about THING different stuff I don't care about THING ... THING even more stuff I don't care about THING stuff I care about"

I want to capture the "stuff I care about" which will always be after the last occurrence of THING. There is the potential for 0 occurrences of THING, or many. If there are 0 occurrences then there is no stuff I care about. The string can't start or end with THING.

Some possible strings:

"stuff I don't care about THING stuff I care about"

"stuff I don't care about"

Some not possible strings:

"THING stuff I care about"

"stuff I don't care about THING stuff I don't care about THING"


My current solution to this problem is to use a regex with two greedy quantifiers as follows:

if( /.*THING(.*)/ ) {
    $myStuff = $1;
}

It seems to be working, but my question is about how the two greedy quantifiers will interact with each other. Is the first (leftmost) greedy quantifier always "more greedy" than the second?

Basically am I guaranteed not to get a split like the following:

"stuff I don't care about THING"

$1 = "different stuff I don't care about THING even more stuff I don't care about THING stuff I care about"

Compared to the split I do want:

"stuff I don't care about THING different stuff I don't care about THING even more stuff I don't care about THING"

"stuff I care about"

Regex returns the longest leftmost match. The first wildcard will initially match through to the end of line, then successively backtrack a character at a time until the rest of the regex yields a match, ie so that the last THING in the string is matched.

During the matching process, .*THING will initially match everything up to and including the last occurrence of THING

If there is no way the rest of the pattern can match , it will backtrack by becoming shorter, and match everything up to and including the last but one occurrence of THING , and again attempt the rest of the pattern

However the rest of the pattern is .* which will always match because it will match an empty string

Therefore, .*THING(.*) will match up to and including the last occurrence of THING , and will match and capture the rest of the string

Note that . will match anything except newlines. If there could be newlines in your text then you will want to use the /s modifier to get it to match anything at all

Note also that if the pattern fails to match (because, say, there is no THING in the string) then $1 will remain unchanged. It will still contain whatever it was set to by the most recent successful pattern match. This means that you must check the status of the pattern match before using the value of $1

Here is my take.

/^(?!THING).+THING((?:(?!THING).)+)$/

Accepts a string with 1 or more occurrences of THING. THING cannot be at the beginning or end of the string. It gets the text after the last time THING appears.

Edit: Added check for 'THING' at the beginning of the string.

EDIT: Wow, rereading your specs (that I really misread). You said If there are 0 occurrences then there is no stuff I care about. The string can't start or end with THING.

Then your regex is fine. tripleee explained the situation well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM