简体   繁体   中英

How to match all words but “stop” in a string by regex

another regex question. I use PHP, and have a string: fdjkaljfdlstopfjdslafdj . You see there is a stop in the middle. I just want to replace any other words excluding that stop . i try to use [^stop] , but it also includes the s at the end of the string.



My Solution

Thanks everyone's help here.

I also figure out a solution with pure RegEx method(I mean in my knowledge scoop to RegEx. PCRE verbs are too advanced for me). But it needs 2 steps. I don't want to mix PHP method in, because sometimes the jobs are out of coding area, ie multi-renaming filenames in Total Commander.

Let's see the string: xxxfooeoropwfoo,skfhlk;afoofsjre,jhgfs,vnhufoolsjunegpq . For example, I want to keep all foo s in this string, and replace any other non-foo greedily into --- .

First, I need to find all the non-foo between each foo : (?<=foo).+?(?=foo) . The string will turn into xxxfoo---foo---foo---foolsjunegpq , just both sides non-foo words left now.

Then use [^-]+(?=foo)|(?<=foo)[^-]+ . This time: ---foo---foo---foo---foo--- . All words but foo have been turned into --- .

i just dont want to include "stop"...

You can skip it by using PCRE verbs (*SKIP)(*F) try like this

stop(*SKIP)(*F)|.

Demo at regex101

or sequence: (stop)(*SKIP)(*F)|(?:(?!(?1)).)+

or for words: stop(*SKIP)(*F)|\\w+

[^stop] doesn't means any text that is NOT stop . It just means any character that is not one of the 4 characters inside [...] which is in this case s,t,o,p .

Better to split on the text you don't want to match:

$s = 'fdjkaljfdlstopfjdslafdjstopfoobar';

php> $arr = preg_split('/stop/', $s);

php> print_r($arr);
Array
(
    [0] => fdjkaljfdl
    [1] => fjdslafdj
    [2] => foobar
)

You can generalize this to any pattern:

(?<neg>stop)(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|(?&neg))

Demo

Just put the pattern you don't want in the neg group.

This regex will try to do the following for any character position:

  • Match the pattern you don't want. If it matches, discard it with (*SKIP)(*FAIL) and restart another match at this position.
  • If the pattern you don't want doesn't match at a particular position, then match anything, until either:
    • You reach the end of the input string ( \\Z )
    • Or the pattern you don't want immediately follows the current matching position ( (?&neg) )

This approach is slower than manually tuning the expression, you could get better performance at the cost of repeating yourself, which avoids the recursion:

stop(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|stop)

But of course, the best approach would be to use the features provided by your language: match the string you don't want, then use code to discard it and keep everything else.

In PHP, you can use the PREG_OFFSET_CAPTURE flag to tell the preg_match_all function to provide you the offsets of each match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM