Here is my regex pattern: [Ss]ection\\s\\d+(?![a-zA-z])(?!</ref>)
For example, it should match: section 5
or section 50
For example, it should not match: section 5A
or section 5</ref>
or section 5A</ref>
or section 50A
Problem is that in reality it matches them wrong: http://regexr.com?33ien
Not sure what's wrong with the pattern though...
Maybe try [Ss]ection\\s\\d++(?![a-zA-z])(?!</ref>)
. ++ is possessive quantifier . This quantifier is similar to greedy quantifier except it blocks fragment of string that it matched from being used by later part of regex.
Example
System.out.println("ababab".matches("(ab)++ab"));
// prints false since last "ab" is possessed by (ab)++
The matches are not wrong : in your regex you want "section " followed by one or more digits not followed by some text or ""
Thats true for section 50A
:
section 5
is followed by 0A
and thats not in your negative lookahead.
You can do something like :
[Ss]ection\s\d+(?![a-zA-Z0-9])(?!</ref>)
This one should work:
[Ss]ection\s\d+(?!\d)(?![a-zA-z])(?!</ref>)
I've explained a problem with our thinking about regexp lookaheads at Strangeness with negative lookahead assertion in Java regular expression , it's applicable here as well.
The situation here is slightly different: negative lookahead does match when we don't want it to, because the matcher is inclined to accept shorter match for the pre-lookahead part if it helps matching expression as a whole . That's why it's important to have an idea of input boundary if you use lookahead: be it a word boundary, an anchor $
, or some assertion about the following text ( not looking at a digit in my proposed solution).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.