简体   繁体   中英

Regex match everything except word

All QA about matching something except word with negative look-ahead that I found imply lines start/end( ^ $ ). But I can't figure out how can I match everything (any character like .* ) except word before some other word in the middle of the processed text.

I should match ABC inside <tag></tag> :

...<tag>a a.__aABC&*</tag>aaa<tag>ffff</tag>...

but not outside (false-positive):

...<tag>a a.__a&*</tag>ABC<tag>ffff</tag>...

So I think I should exclude tag closing ( </tag> ) before ABC . I tried:

<tag>(?!<\/tag>)ABC.*?<\/tag>

but such way it doesn't allow to match .* except </tag> before ABC . How can I implement this?

Useful links:

1 , 2 .

Since you're using one of Perl / PCRE, the fastest way to do it is like this:

/(?s)<tag>(?:<\\/tag>(*SKIP)(*FAIL)|.)*?ABC.*?<\\/tag>/

https://regex101.com/r/AoiwIH/1

Expanded

 (?s)
 <tag>  
 (?:
      </tag>
      (*SKIP) (*FAIL) 
   |  
      . 
 )*?
 ABC
 .*? 
 </tag>

Benchmark compare with the assertion method

Regex1:   (?s)<tag>(?:</tag>(*SKIP)(*FAIL)|.)*?ABC.*?</tag>
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   1
Elapsed Time:    0.25 s,   254.91 ms,   254905 µs
Matches per sec:   196,151


Regex2:   (?s)<tag>(?:(?!</tag>).)*?ABC.*?</tag>
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   1
Elapsed Time:    0.33 s,   329.10 ms,   329095 µs
Matches per sec:   151,931

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM