如何使用Perl的正则表达式遍历多行字符串

Question

我需要使用Perl从多行字符串中提取几个部分。 我在while循环中应用相同的正则表达式。 我的问题是获取以文件结尾的最后一部分。 我的解决方法是附加标记。 这样，正则表达式将始终找到并结束。 有更好的方法吗？

示例文件：

Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

Perl脚本：

#!/usr/bin/env perl

my $desc = do { local $/ = undef; <> };

$desc .= "\n===="; # set the end marker

while($desc =~ /^==== (?<filename>.*?)#.*?====$(?<content>.*?)(?=^====)/mgsp) {
  print "filename=", $+{filename}, "\n";
  print "content=", $+{content}, "\n";
}

这样脚本可以找到两个段。 如何避免添加标记？

Answer 1

使用贪婪修饰符? 是一个巨大的红旗。 通常，您可以一次在模式中使用它一次，但是通常不止于此，这是一个错误。 如果要匹配不包含字符串的文本，请改用以下内容：

(?:(?!STRING).)*

这样您将获得以下内容：

/
   ^==== [ ] (?<filename> [^\n]+ ) [ ] ====\n
   (?<content> (?:(?! ^==== ).)* )
/xsmg

码：

my $desc = do { local $/; <DATA> };

while (
   $desc =~ /
      ^==== [ ] (?<filename> [^\n]+ ) [ ] ====\n
      (?<content> (?:(?! ^==== ).)* )
   /xsmg
) {
   print "filename=<<$+{filename}>>\n";
   print "content=<<$+{content}>>\n";
}

__DATA__
Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

输出：

filename=<</home/src/file1.c#1>>
content=<<content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

>>
filename=<</home/src/file2.c#1>>
content=<<content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2
>>

Answer 2

首先，将整个文件都包含在文件中，这使您变得更加尴尬。 如果逐行读取文件，这相对简单

use strict;
use warnings 'all';

my $file;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
        print "filename=$file\n";
        print 'content=';
    }
    elsif ( $file ) {
        print;
    }
}

产量

filename=/home/src/file1.c
content=content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

filename=/home/src/file2.c
content=content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

或者，如果您需要将每个文件的全部内容存储为哈希，则可能看起来像这样

use strict;
use warnings 'all';

my $file;
my %data;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
    }
    elsif ( $file ) {
        $data{$file} .= $_;
    }
}

for my $file ( sort keys %data ) {
    print "filename=$file\n";
    print "content=$data{$file}";
}

输出与上面第一个版本的输出相同

如何使用Perl的正则表达式遍历多行字符串

问题描述

2 个解决方案

解决方案1
4 已采纳 2016-06-15 03:35:01

解决方案2
1 2016-06-15 14:22:33

产量

如何使用Perl的正则表达式遍历多行字符串

问题描述

2 个解决方案

解决方案1 4 已采纳 2016-06-15 03:35:01

解决方案2 1 2016-06-15 14:22:33

产量

解决方案1
4 已采纳 2016-06-15 03:35:01

解决方案2
1 2016-06-15 14:22:33