简体   繁体   English

如何使用Perl的正则表达式遍历多行字符串

[英]How to iterate over a multiline string with perl's regex

I need to extract several sections from a multiline string with Perl. 我需要使用Perl从多行字符串中提取几个部分。 I'm applying the same regex in a while loop. 我在while循环中应用相同的正则表达式。 My problem is to get the last section which ends with the file. 我的问题是获取以文件结尾的最后一部分。 My workaround is to append the marker. 我的解决方法是附加标记。 This way the regex will always find and end. 这样,正则表达式将始终找到并结束。 Is there a better way to do it? 有更好的方法吗?

Example file: 示例文件:

Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

Perl script: Perl脚本:

#!/usr/bin/env perl

my $desc = do { local $/ = undef; <> };

$desc .= "\n===="; # set the end marker

while($desc =~ /^==== (?<filename>.*?)#.*?====$(?<content>.*?)(?=^====)/mgsp) {
  print "filename=", $+{filename}, "\n";
  print "content=", $+{content}, "\n";
}

This way the script finds both segments. 这样脚本可以找到两个段。 How can I avoid adding the marker? 如何避免添加标记?

Use of the greediness modifier ? 使用贪婪修饰符? is a giant red flag. 是一个巨大的红旗。 You can usually get away with using it once in a pattern, but more than that is usually a bug. 通常,您可以一次在模式中使用它一次,但是通常不止于此,这是一个错误。 If you want to match text that doesn't contain a string, use the following instead: 如果要匹配不包含字符串的文本,请改用以下内容:

(?:(?!STRING).)*

So that gets you the following: 这样您将获得以下内容:

/
   ^==== [ ] (?<filename> [^\n]+ ) [ ] ====\n
   (?<content> (?:(?! ^==== ).)* )
/xsmg

Code: 码:

my $desc = do { local $/; <DATA> };

while (
   $desc =~ /
      ^==== [ ] (?<filename> [^\n]+ ) [ ] ====\n
      (?<content> (?:(?! ^==== ).)* )
   /xsmg
) {
   print "filename=<<$+{filename}>>\n";
   print "content=<<$+{content}>>\n";
}

__DATA__
Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

Output: 输出:

filename=<</home/src/file1.c#1>>
content=<<content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

>>
filename=<</home/src/file2.c#1>>
content=<<content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2
>>

You've made this more awkward by slurping the whole file in the first place. 首先,将整个文件都包含在文件中,这使您变得更加尴尬。 This is relatively simple if you read the file line-by-line 如果逐行读取文件,这相对简单

use strict;
use warnings 'all';

my $file;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
        print "filename=$file\n";
        print 'content=';
    }
    elsif ( $file ) {
        print;
    }
}

output 产量

filename=/home/src/file1.c
content=content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

filename=/home/src/file2.c
content=content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

Alternatively, if you need to store the whole content per file, perhaps as a hash, it would look like this 或者,如果您需要将每个文件的全部内容存储为哈希,则可能看起来像这样

use strict;
use warnings 'all';

my $file;
my %data;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
    }
    elsif ( $file ) {
        $data{$file} .= $_;
    }
}

for my $file ( sort keys %data ) {
    print "filename=$file\n";
    print "content=$data{$file}";
}

The output is identical to that of the first version above 输出与上面第一个版本的输出相同

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM