简体   繁体   English

如何使用 Perl 提取某些行?

[英]How can I extract certain lines with Perl?

I have string like this我有这样的字符串

Modified files: ['A', 'B']

File: /tpl/src/vlan/VlanInterfaceValidator.cpp

Newly generated warnings:
A has warnings
B has warning

Status: PASS

I want the value of "Newly generated warnings:" which should be我想要“新生成的警告:”的值,应该是

A has warnings
B has warning

I am new to perl and don't know how to use regex in Perl.我是 perl 的新手,不知道如何在 Perl 中使用正则表达式。 Kindly help.请帮忙。

Here are two options:这里有两个选项:

  • split the string into lines, and filter the lines array using grep将字符串拆分为行,并使用 grep 过滤行数组
  • use a regex on the multi-line string在多行字符串上使用正则表达式
my $str = "
Modified files: ['A', 'B']

File: /tpl/src/vlan/VlanInterfaceValidator.cpp

Newly generated warnings:
A has warnings
B has warning

Status: PASS";

my @lines = grep{ /\w+ has warning/ } split(/\n/, $str);

print "Option 1 using split and grep:\n";
print join("\n", @lines);

$str =~ s/^.*Newly generated warnings:\s*(.*?)\s+Status:.*$/$1/sm;
print "\n\nOption 2 using regex:\n";
print $str;

Output: Output:

Option 1 using split and grep:
A has warnings
B has warning

Option 2 using regex:
A has warnings
B has warning

Explanation for option 1:选项1的解释:

  • split(/\n/, $str) - split the string into an array of strings split(/\n/, $str) - 将字符串拆分为字符串数组
  • grep{ /\w+ has warning/ } - filter using a grep regex to lines of interest grep{ /\w+ has warning/ } - 使用 grep 正则表达式过滤感兴趣的行
    • Note: This is short for the standard regex test $_ =~ /\w+ has warning/ .注意:这是标准正则表达式测试$_ =~ /\w+ has warning/的缩写。 The $_ contains the string element, eg line. $_包含字符串元素,例如行。

Explanation for option 1:选项1的解释:

  • $str =~ s/search/replace/ - standard search and replace on a string $str =~ s/search/replace/ - 标准搜索和替换字符串
    • Note: Unlike in many other languages, strings are mutable in Perl注意:与许多其他语言不同,字符串在 Perl 中是可变的
  • s/^.*Newly generated warnings:\s*(.*?)\s+Status:.*$/$1/sm : s/^.*Newly generated warnings:\s*(.*?)\s+Status:.*$/$1/sm
    • search:搜索:
      • ^.* - from beginning of string grab everything until: ^.* - 从字符串的开头抓取所有内容,直到:
      • Newly generated warnings:
      • \s+ - scan over whitespace \s+ - 扫描空白
      • (.*?) - capture group 1 with non-greedy scan (.*?) - 使用非贪婪扫描捕获组 1
      • \s+Status:.*$ - scan over whitespace, Status: , and everything else to end of string \s+Status:.*$ - 扫描空格、 Status:和其他所有内容到字符串末尾
    • replace:代替:
      • $1 - use capture group 1 $1 - 使用捕获组 1
    • flags:标志:
      • s - dot matches newlines s - 点匹配换行符
      • m - multiple lines, eg ^ is start of string, $ end of string m - 多行,例如^是字符串的开头, $是字符串的结尾

This sort of problem where you can read up to the line that has the section that you want and do nothing with those lines, then read lines until the start of the stuff you do want, keeping those lines:这种问题你可以阅读到包含你想要的部分的行并且对这些行不做任何事情,然后阅读行直到你想要的东西的开始,保留这些行:

# ignore all these lines
while( <DATA> ) {
    last if /Newly generated warnings/;
    }

# process all these lines
while( <DATA> ) {
    last if /\A\s*\z/;  # stop of the first blank line
    print;  # do whatever you need
    }

__END__
Modified files: ['A', 'B']

File: /tpl/src/vlan/VlanInterfaceValidator.cpp

Newly generated warnings:
A has warnings
B has warning

Status: PASS

That's reading from a filehandle.那是从文件句柄中读取的。 Handling a string is trivially easy because you can open a filehandle on a string so you can treat the string line-by-line:处理字符串非常简单,因为您可以在字符串上打开文件句柄,以便逐行处理字符串:

my $string = <<'HERE';
Modified files: ['A', 'B']

File: /tpl/src/vlan/VlanInterfaceValidator.cpp

Newly generated warnings:
A has warnings
B has warning

Status: PASS
HERE

open my $fh, '<', \ $string;

while( <$fh> ) {
    last if /Newly generated warnings/;
    }

while( <$fh> ) {
    last if /\A\s*\z/;
    print;  # do whatever you need
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM