简体   繁体   English

在Perl中的两个符号之间找不到模式时,该如何删除?

[英]How to delete a pattern when it is not found between two symbols in Perl?

I have a document like this: 我有一个像这样的文件:

Once upon a time, there lived a cat.
The AAAAAA cat was ZZZZZZ very happy.
The AAAAAAcatZZZZZZ knew many other cats from many AAAAAA cities ZZZZZZ.
The cat knew brown cats and AAAAAA green catsZZZZZZ and red cats.

The AAAAAA and ZZZZZZ are similar to { and } , but are used to avoid problems with other scripts that might interpret { and } as other meanings. AAAAAAZZZZZZ{}相似,但用于避免其他脚本的问题,这些脚本可能会将{}解释为其他含义。

I need to delete all appearances of "cat" when it is not found between an AAAAAA and ZZZZZZ . 当在AAAAAAZZZZZZ之间找不到“ cat”时,我需要删除所有外观。

Once upon a time, there lived a .
The AAAAAA cat was ZZZZZZ very happy.
The AAAAAAcatZZZZZZ knew many other s from many AAAAAA cities ZZZZZZ.
The  knew brown s and AAAAAA green catsZZZZZZ and red s.
  • All AAAAAA 's have a matching ZZZZZZ . 所有AAAAAA都有一个匹配的ZZZZZZ
  • The AAAAAA 's and matching ZZZZZZ 's are not split across lines. AAAAAA和匹配的ZZZZZZ不在行之间拆分。
  • The AAAAAA 's and matching ZZZZZZ 's are never nested. 绝对不会嵌套AAAAAA和匹配的ZZZZZZ
  • The pattern, "cat" in the example above, is not treated as a word. 上例中的模式“ cat”未视为单词。 This could be anything. 这可以是任何东西。

I have tried several things, eg: 我已经尝试了几件事,例如:

perl -pe 's/[^AAAAAAA](.*)(cat)(.*)[^BBBBBBB]//g' <<< "AAAAAAA cat 1 BBBBBBB cat 2"

How can I delete any pattern when it is not found between some matching set of symbols? 在某些匹配的符号集之间找不到任何模式时,该如何删除?

You have several possible ways: 您有几种可能的方法:

  1. You can use the \\K feature to remove the part you don't want from match result: 您可以使用\\K功能从匹配结果中删除不需要的部分:

     s/AAAAAA.*?ZZZZZZ\\K|cat//gs 

    ( \\K removes all on the left from match result, but all characters on left are consumed by the regex engine. Consequence, when the first part of the alternation succeeds, you replace the empty string (immediatly after ZZZZZZ) with an empty string.) \\K从匹配结果中删除了所有左边的字符,但是正则表达式引擎消耗了所有左边的字符。因此,当替换的第一部分成功后,您将一个空字符串替换为一个空字符串(紧接在ZZZZZZ之后)。 )

  2. You can use a capturing group to inject as it (with a reference $1 ) the substring you want to preserve in the replacement string: 您可以使用捕获组将要保留的子字符串(参考$1 )注入其中:

     s/(AAAAAA.*?ZZZZZZ)|cat/$1/gs 
  3. You can use backtracking control verbs to skip and not retry the substring matched: 您可以使用回溯控制动词来跳过而不重试匹配的子字符串:

     s/AAAAAA.*?ZZZZZZ(*SKIP)(*FAIL)|cat//gs 

    ( (*SKIP) forces the regex engine to not retry the substring found on the left if the pattern fails later. (*FAIL) forces the pattern to fail.) (*SKIP)强制正则表达式引擎在以后模式失败后不重试在左侧找到的子字符串。 (*FAIL)强制模式失败。)

Note: if AAAAAA and ZZZZZZ must be always on the same line, you can remove the /s modifier and process the data line by line. 注意:如果AAAAAA和ZZZZZZ必须始终位于同一行,则可以删除/s 修饰符并逐行处理数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM