简体   繁体   中英

How to delete a pattern when it is not found between two symbols in Perl?

I have a document like this:

Once upon a time, there lived a cat.
The AAAAAA cat was ZZZZZZ very happy.
The AAAAAAcatZZZZZZ knew many other cats from many AAAAAA cities ZZZZZZ.
The cat knew brown cats and AAAAAA green catsZZZZZZ and red cats.

The AAAAAA and ZZZZZZ are similar to { and } , but are used to avoid problems with other scripts that might interpret { and } as other meanings.

I need to delete all appearances of "cat" when it is not found between an AAAAAA and ZZZZZZ .

Once upon a time, there lived a .
The AAAAAA cat was ZZZZZZ very happy.
The AAAAAAcatZZZZZZ knew many other s from many AAAAAA cities ZZZZZZ.
The  knew brown s and AAAAAA green catsZZZZZZ and red s.
  • All AAAAAA 's have a matching ZZZZZZ .
  • The AAAAAA 's and matching ZZZZZZ 's are not split across lines.
  • The AAAAAA 's and matching ZZZZZZ 's are never nested.
  • The pattern, "cat" in the example above, is not treated as a word. This could be anything.

I have tried several things, eg:

perl -pe 's/[^AAAAAAA](.*)(cat)(.*)[^BBBBBBB]//g' <<< "AAAAAAA cat 1 BBBBBBB cat 2"

How can I delete any pattern when it is not found between some matching set of symbols?

You have several possible ways:

  1. You can use the \\K feature to remove the part you don't want from match result:

     s/AAAAAA.*?ZZZZZZ\\K|cat//gs 

    ( \\K removes all on the left from match result, but all characters on left are consumed by the regex engine. Consequence, when the first part of the alternation succeeds, you replace the empty string (immediatly after ZZZZZZ) with an empty string.)

  2. You can use a capturing group to inject as it (with a reference $1 ) the substring you want to preserve in the replacement string:

     s/(AAAAAA.*?ZZZZZZ)|cat/$1/gs 
  3. You can use backtracking control verbs to skip and not retry the substring matched:

     s/AAAAAA.*?ZZZZZZ(*SKIP)(*FAIL)|cat//gs 

    ( (*SKIP) forces the regex engine to not retry the substring found on the left if the pattern fails later. (*FAIL) forces the pattern to fail.)

Note: if AAAAAA and ZZZZZZ must be always on the same line, you can remove the /s modifier and process the data line by line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM