简体   繁体   中英

Regex Multiple Negative Lookahead

Here is my regex pattern: [Ss]ection\\s\\d+(?![a-zA-z])(?!</ref>)

For example, it should match: section 5 or section 50

For example, it should not match: section 5A or section 5</ref> or section 5A</ref> or section 50A

Problem is that in reality it matches them wrong: http://regexr.com?33ien

Not sure what's wrong with the pattern though...

Maybe try [Ss]ection\\s\\d++(?![a-zA-z])(?!</ref>) . ++ is possessive quantifier . This quantifier is similar to greedy quantifier except it blocks fragment of string that it matched from being used by later part of regex.

Example

System.out.println("ababab".matches("(ab)++ab")); 
// prints false since last "ab" is possessed by (ab)++ 

The matches are not wrong : in your regex you want "section " followed by one or more digits not followed by some text or ""

Thats true for section 50A :

section 5 is followed by 0A and thats not in your negative lookahead.

You can do something like :

[Ss]ection\s\d+(?![a-zA-Z0-9])(?!</ref>)

This one should work:

[Ss]ection\s\d+(?!\d)(?![a-zA-z])(?!</ref>)

I've explained a problem with our thinking about regexp lookaheads at Strangeness with negative lookahead assertion in Java regular expression , it's applicable here as well.

The situation here is slightly different: negative lookahead does match when we don't want it to, because the matcher is inclined to accept shorter match for the pre-lookahead part if it helps matching expression as a whole . That's why it's important to have an idea of input boundary if you use lookahead: be it a word boundary, an anchor $ , or some assertion about the following text ( not looking at a digit in my proposed solution).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM