简体   繁体   English

Perl不匹配正则表达式?

[英]Perl not matching regex?

I'm trying to remove all the comments in a bunch of SGF files, and have come up with the following perl command: 我试图删除一堆SGF文件中的所有注释,并提出以下perl命令:

perl -pi -e 's/P?C\[(?:[^\]\\]++|\\.)*+\]//gm' *.sgf

I'm trying to match and remove a C or PC followed by a left bracket, then characters that aren't right brackets (if they are they have to be escaped with a \\ ) and then a right bracket. 我正在尝试匹配并删除一个C或PC,然后是左括号,然后是不是右括号的字符(如果它们必须用\\来转义),然后是右括号。

I'm trying to match the following examples: 我正在尝试匹配以下示例:

C[HelloBot9 [-\\]: GTP Engine for HelloBot9 (white): HelloBot version 0.6.26.08]

PC[IA [-\]: GTP Engine for IA (black): GNU Go version 3.7.11
]

C[person [-\\]: \\\\\\]]

C[AyaMC [3k\]: GTP Engine for AyaMC (black): Aya version 6.61 : If you pass, AyaMC 
will pass. When AyaMC does not, please remove all dead stones.]

And some examples that shouldn't be matched: 还有一些不应该匹配的例子:

XYZ[Other stuff \\]]

C[stuff\\]

PC[stuff\\\\\\]

The regex works in several online regex testers (including a few that state they are perl regex testers), but for some reason doesn't work on the command line. 正则表达式适用于几个在线正则表达式测试人员(包括一些表明他们是perl正则表达式测试人员),但由于某种原因在命令行上不起作用。 Help is appreciated. 感谢帮助。

You need to run perl with -0777 option to make sure that contents spanning across lines and matching the pattern can be found. 您需要使用-0777选项运行perl ,以确保可以找到跨越行并匹配模式的内容。 So, using perl -0777pi -e instead of perl -pi -e will solve the issue. 因此,使用perl -0777pi -e而不是perl -pi -e将解决问题。

I would also suggest optimizing the pattern a bit by unrolling the alternation group, thus, making matching process "linear": 我还建议通过展开交替组来优化模式,从而使匹配过程“线性”:

s/P?C\[[^]\\]*(?:\\.[^]\\]*+)*]//sg

Note that if PC should be matched as a whole word, add \\b before P . 请注意,如果PC应作为整个单词匹配,请在P之前添加\\b

Pattern details : 图案细节

  • P?C\\[ - either PC[ or C[ literal char sequence P?C\\[ - PC[C[字面字符序列
  • [^]\\\\]* - zero or more chars other than \\ and ] [^]\\\\]* -比其他零个或多个字符\\]
  • (?:\\\\.[^]\\\\]*+)* - zero or more sequences of: (?:\\\\.[^]\\\\]*+)* - 零个或多个序列:
    • \\\\. - a literal \\ and then any char ( . ) - 文字\\然后任何字符( .
    • [^]\\\\]*+ - 0+ chars other than ] and \\ (matched possessively, no backtracking into the pattern) [^]\\\\]*+ - 除了]\\之外的0 [^]\\\\]*+字符和\\ (占有率,没有回溯到模式中)
  • ] - a literal ] symbol (note it does not have to be escaped outside the character class to denote a literal closing bracket) ] - 一个文字]符号(注意它不必在字符类之外转义以表示文字结束括号)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM