简体   繁体   English

如何在Perl中的匹配行之后抓取多行?

[英]How can I grab multiple lines after a matching line in Perl?

I'm parsing a large file in Perl line-by-line (terminated by \\n), but when I reach a certain keyword, say "TARGET", I need to grab all the lines between TARGET and the next completely empty line. 我在Perl中逐行解析一个大文件(以\\ n结尾),但当我到达某个关键字时,说“TARGET”,我需要抓住TARGET和下一个完全空行之间的所有行。

So, given a segment of a file: 所以,给定一个文件的片段:

Line 1 第1行
Line 2 第2行
Line 3 第3行
Line 4 Target 第4行目标
Line 5 Grab this line 5号线抓住这条线
Line 6 Grab this line 6号线抓住这条线
\\n \\ n

It should become: 它应该成为:
Line 4 Target 第4行目标
Line 5 Grab this line 5号线抓住这条线
Line 6 Grab this line 6号线抓住这条线

The reason I'm having trouble is I'm already going through the file line-by-line; 我遇到麻烦的原因是我已经逐行浏览了这个文件; how do I change what I delimit by midway through the parsing process? 如何在解析过程中途改变我划分的内容?

You want something like this: 你想要这样的东西:

my @grabbed;
while (<FILE>) {
    if (/TARGET/) {
        push @grabbed, $_;
        while (<FILE>) {
            last if /^$/;
            push @grabbed, $_;
        }
    }
}

The range operator is ideal for this sort of task: 范围运算符非常适合此类任务:

$ cat try
#! /usr/bin/perl

while (<DATA>) {
  print if /\btarget\b/i .. /^\s*$/
}

__DATA__
Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

Nope
Line 7 Target
Linu 8 Yep

Nope again

$ ./try
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

Line 7 Target
Linu 8 Yep

The short answer: line delimiter in perl is $/ , so when you hit TARGET, you can set $/ to "\\n\\n" , read the next "line", then set it back to "\\n"... et voilà! 简短回答:perl中的行分隔符是$/ ,所以当你点击TARGET时,可以设置$/"\\n\\n" ,读取下一行“行”,然后将其设置回“\\ n”...... etvoilà!

Now for the longer one: if you use the English module (which gives sensible names to all of Perl's magic variable, then $/ is called $RS or $INPUT_RECORD_SEPARATOR . If you use IO::Handle , then IO::Handle->input_record_separator( "\\n\\n") will work. 现在更长的一个:如果你使用English模块(它为Perl的所有魔术变量提供合理的名称,那么$/被称为$RS$INPUT_RECORD_SEPARATOR 。如果你使用IO::Handle ,那么IO::Handle->input_record_separator( "\\n\\n")将起作用。

And if you're doing this as part of a bigger piece of code, don't forget to either localize (using local $/; in the appropriate scope) or to set back $/ to its original value of "\\n" . 如果您将此作为更大代码的一部分,请不要忘记本地化(使用local $/;在适当的范围内)或将$/设置回其原始值"\\n"

From perlfaq6 's answer to How can I pull out lines between two patterns that are themselves on different lines? perlfaq6的答案我怎样才能在不同线条上的两个模式之间拉出线条?


You can use Perl's somewhat exotic .. operator (documented in perlop): 你可以使用Perl有点奇特的运算符(在perlop中记录):

perl -ne 'print if /START/ .. /END/' file1 file2 ...

If you wanted text and not lines, you would use 如果你想要文字而不是线条,你会使用

perl -0777 -ne 'print "$1\n" while /START(.*?)END/gs' file1 file2 ...

But if you want nested occurrences of START through END, you'll run up against the problem described in the question in this section on matching balanced text. 但是,如果您希望嵌套出现START到END,那么您将遇到本节中有关匹配平衡文本的问题中描述的问题。

Here's another example of using ..: 这是使用..的另一个例子:

while (<>) {
    $in_header =   1  .. /^$/;
    $in_body   = /^$/ .. eof;
# now choose between them
} continue {
    $. = 0 if eof;  # fix $.
}
while(<FILE>)
{
    if (/target/i)
    {
        $buffer .= $_;
        while(<FILE>)
        {
            $buffer .= $_;
            last if /^\n$/;
        }
    }
}
use strict;
use warnings;

my $inside = 0;
my $data = '';
while (<DATA>) {
    $inside = 1 if /Target/;
    last if /^$/ and $inside;
    $data .= $_ if $inside;
}

print '[' . $data . ']';

__DATA__
Line 1
Line 2
Line 3
Line 4 Target
Line 5 Grab this line
Line 6 Grab this line

Next Line

Edit to fix the exit condition as per the note below. 根据下面的注释编辑以修复退出条件。

If you don't mind ugly auto-generated code, and assuming you just want lines between TARGET and the next empty line, and want all the other lines to be dropped, you can use the output of this command: 如果您不介意丑陋的自动生成代码,并且假设您只想要TARGET和下一个空行之间的行,并希望删除所有其他行,则可以使用此命令的输出:

s2p -ne '/TARGET/,/^$/p'

(Yes, this is a hint that this problem is usually much more easily solved in sed . :-P) (是的,这是暗示这个问题通常在sed更容易解决。:-P)

If you only want one loop (modifying Dave Hinton's code): 如果你只想要一个循环(修改Dave Hinton的代码):

my @grabbed;
my $grabbing = 0;
while (<FILE>) {
    if (/TARGET/ ) {
       $grabbing = 1;
    } elsif( /^$/ ) {
       $grabbing = 0;
    }
    if ($grabbing) {
        push @grabbed, @_;
    }
}
while (<IN>) {
print OUT if (/Target/../^$/) ; 
}   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM