All QA about matching something except word with negative look-ahead that I found imply lines start/end( ^
$
). But I can't figure out how can I match everything (any character like .*
) except word before some other word in the middle of the processed text.
I should match ABC
inside <tag></tag>
:
...<tag>a a.__aABC&*</tag>aaa<tag>ffff</tag>...
but not outside (false-positive):
...<tag>a a.__a&*</tag>ABC<tag>ffff</tag>...
So I think I should exclude tag closing ( </tag>
) before ABC
. I tried:
<tag>(?!<\/tag>)ABC.*?<\/tag>
but such way it doesn't allow to match .*
except </tag>
before ABC
. How can I implement this?
Useful links:
Since you're using one of Perl / PCRE, the fastest way to do it is like this:
/(?s)<tag>(?:<\\/tag>(*SKIP)(*FAIL)|.)*?ABC.*?<\\/tag>/
https://regex101.com/r/AoiwIH/1
Expanded
(?s)
<tag>
(?:
</tag>
(*SKIP) (*FAIL)
|
.
)*?
ABC
.*?
</tag>
Benchmark compare with the assertion method
Regex1: (?s)<tag>(?:</tag>(*SKIP)(*FAIL)|.)*?ABC.*?</tag>
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.25 s, 254.91 ms, 254905 µs
Matches per sec: 196,151
Regex2: (?s)<tag>(?:(?!</tag>).)*?ABC.*?</tag>
Completed iterations: 50 / 50 ( x 1000 )
Matches found per iteration: 1
Elapsed Time: 0.33 s, 329.10 ms, 329095 µs
Matches per sec: 151,931
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.