简体   繁体   English

如何将命令行中的多行模式与 perl 样式的正则表达式匹配?

[英]How can I match multi-line patterns in the command line with perl-style regex?

I regularly use regex to transform text.我经常使用正则表达式来转换文本。

To transform, giant text files from the command line, perl lets me do this:要从命令行转换巨大的文本文件,perl 让我这样做:

perl -pe < in.txt > out.txt

But this is inherently on a line-by-line basis.但这本质上是逐行的。 Occasionally, I want to match on multi-line things.有时,我想匹配多行的东西。

How can I do this in the command-line?如何在命令行中执行此操作?

To slurp a file instead of doing line by line processing, use the -0777 switch:要 slurp 文件而不是逐行处理,请使用-0777开关:

perl -0777 -pe 's/.../.../g' in.txt > out.txt

As documented in perlrun #Command Switches :perlrun #Command Switches

The special value -00 will cause Perl to slurp files in paragraph mode.特殊值-00将导致 Perl 在段落模式下吞咽文件。 Any value -0400 or above will cause Perl to slurp files whole, but by convention the value -0777 is the one normally used for this purpose.任何-0400或更高的值都会导致 Perl -0777整个文件,但按照惯例,值-0777是通常用于此目的的值。

Obviously, for large files this may not work well, in which case you'll need to code some type of buffer to do this replacement.显然,对于大文件,这可能效果不佳,在这种情况下,您需要编写某种类型的缓冲区来执行此替换。 We can't advise any better though without real information about your intent.尽管没有关于您意图的真实信息,我们也无法提供更好的建议。

Grepping across line boundaries Grepping 跨行边界

So you want to grep across lines boundaries...所以你想跨行边界grep...

You quite possibly already have pcregrep installed.您很可能已经安装了pcregrep As you may know, PCRE stands for Perl-Compatible Regular Expressions , and the library is definitely Perl-style, though not identical to Perl.如您所知,PCRE 代表Perl-Compatible Regular Expressions ,该库绝对是 Perl 风格的,尽管与 Perl 不同。

To match across multiple lines, you have to turn on the multi-line mode -M , which is not the same as (?m)要跨多行匹配,您必须打开多行模式-M ,这与(?m)

Running pcregrep -M "(?s)^b.*\\d+" text.txt运行pcregrep -M "(?s)^b.*\\d+" text.txt

On this text file:在这个文本文件上:

a
b
c11

The output will be输出将是

b
c11

whereas grep would return empty.而 grep 会返回空。

Excerpt from the doc:摘自文档:

-M, --multiline Allow patterns to match more than one line. -M, --multiline 允许模式匹配多于一行。 When this option is given, patterns may usefully contain literal newline char- acters and internal occurrences of ^ and $ characters.当给出这个选项时,模式可能有用地包含文字换行符和 ^ 和 $ 字符的内部出现。 The output for a successful match may consist of more than one line, the last of which is the one in which the match ended.成功匹配的输出可能包含多行,最后一行是匹配结束的那一行。 If the matched string ends with a newline sequence the output ends at the end of that line.如果匹配的字符串以换行序列结尾,则输出在该行的末尾结束。

When this option is set, the PCRE library is called in "mul- tiline" mode.设置此选项后,将在“多行”模式下调用 PCRE 库。 There is a limit to the number of lines that can be matched, imposed by the way that pcregrep buffers the input file as it scans it.可以匹配的行数是有限制的,这是由 pcregrep 在扫描输入文件时缓冲输入文件的方式所强加的。 However, pcregrep ensures that at least 8K characters or the rest of the document (whichever is the shorter) are available for forward matching, and simi- larly the previous 8K characters (or all the previous charac- ters, if fewer than 8K) are guaranteed to be available for lookbehind assertions.但是,pcregrep 确保至少有 8K 个字符或文档的其余部分(以较短者为准)可用于前向匹配,类似地,前面的 8K 个字符(或所有前面的字符,如果少于 8K)保证可用于后视断言。 This option does not work when input is read line by line (see --line-buffered.)当逐行读取输入时,此选项不起作用(请参阅 --line-buffered。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM