简体   繁体   中英

Make operator OR in regular expression greedy

I have to match asd["]"] inside asd["]"] asd

I use regular expression:

/([a-z]+?(\[[^,\]]*?\]|\[\".*\"\]))/u

but it gives me asd["]

If I change the order of OR elements in regular expression:

/([a-z]+?(\[\".*\"\]|\[[^,\]]*?\]))/u

I can get desired result, but I believe this way some other cases could not work. This is minified version of my regular expression.

How could I point to regular expression to chose longest possible match (act greedy)?

Edit:

With regexp:

/{((\"a\")|([^b]*)})/u

I get

{c {"a"}

from

{b{c {"a"} b}

In this case regexp chose second OR statement which is longer than first.

For some reason, in your two alternatives you made the one ungreedy that does not need it and left the one greedy that actually needs to be ungreedy. To solve your problem with regular expressions (although it will have some caveats), you should probably use negated character classes in any case:

'/([a-z]+?)\[([^"\[\]]*|"[^"]*")\]/'

This should work fine for your given example. It will find the innermost asd[something here] or asd["something with [][] here"] .

I was speaking of caveats. For the non-quoted case this cannot find nested occurrences. In asd[b efg[something]] it will match efg[something] and not the outer brackets. However , even if it could, then you would lose the inner match, because matches cannot overlap . If your desire is to find the outermost valid brackets (so only the full string in the given example), you should look into PCRE's recursion capabilities . Just note that you have to decide on innermost or outermost. Neither preg_match nor preg_match_all will find you all nested matches.

This regex seemed to work for me:

/([a-z]+(\[[^,]*\]|\[\".*\"\]))/

In it I simply removed the \\] inside the [^,\\]] section of your original regex, and removed all non-greedy wildcards, though they seemed to be having no effect anyway.
If you're looking for nested structures, looking for " not closing symbol " will never find you your longest match, as it will always stop at the first (innermost) closing character, so you have to pick something else that's unique inside the enclosure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM