如何使用Perl的正則表達式遍歷多行字符串

Question

我需要使用Perl從多行字符串中提取幾個部分。 我在while循環中應用相同的正則表達式。 我的問題是獲取以文件結尾的最后一部分。 我的解決方法是附加標記。 這樣，正則表達式將始終找到並結束。 有更好的方法嗎？

示例文件：

Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

Perl腳本：

#!/usr/bin/env perl

my $desc = do { local $/ = undef; <> };

$desc .= "\n===="; # set the end marker

while($desc =~ /^==== (?<filename>.*?)#.*?====$(?<content>.*?)(?=^====)/mgsp) {
  print "filename=", $+{filename}, "\n";
  print "content=", $+{content}, "\n";
}

這樣腳本可以找到兩個段。 如何避免添加標記？

Answer 1

使用貪婪修飾符? 是一個巨大的紅旗。 通常，您可以一次在模式中使用它一次，但是通常不止於此，這是一個錯誤。 如果要匹配不包含字符串的文本，請改用以下內容：

(?:(?!STRING).)*

這樣您將獲得以下內容：

/
   ^==== [ ] (?<filename> [^\n]+ ) [ ] ====\n
   (?<content> (?:(?! ^==== ).)* )
/xsmg

碼：

my $desc = do { local $/; <DATA> };

while (
   $desc =~ /
      ^==== [ ] (?<filename> [^\n]+ ) [ ] ====\n
      (?<content> (?:(?! ^==== ).)* )
   /xsmg
) {
   print "filename=<<$+{filename}>>\n";
   print "content=<<$+{content}>>\n";
}

__DATA__
Header

==== /home/src/file1.c#1 ====
content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

==== /home/src/file2.c#1 ====
content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

輸出：

filename=<</home/src/file1.c#1>>
content=<<content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

>>
filename=<</home/src/file2.c#1>>
content=<<content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2
>>

Answer 2

首先，將整個文件都包含在文件中，這使您變得更加尷尬。 如果逐行讀取文件，這相對簡單

use strict;
use warnings 'all';

my $file;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
        print "filename=$file\n";
        print 'content=';
    }
    elsif ( $file ) {
        print;
    }
}

產量

filename=/home/src/file1.c
content=content file1
line 1 of file1
line 2 of file1
line 3 of file1

another line of file1

filename=/home/src/file2.c
content=content file2
line 1 of file2
line 2 of file2
line 3 of file2

another line of file2

或者，如果您需要將每個文件的全部內容存儲為哈希，則可能看起來像這樣

use strict;
use warnings 'all';

my $file;
my %data;

while ( <> ) {
    if ( /^====\s+(.*\S)#\S*\s+====/ ) {
        $file = $1;
    }
    elsif ( $file ) {
        $data{$file} .= $_;
    }
}

for my $file ( sort keys %data ) {
    print "filename=$file\n";
    print "content=$data{$file}";
}

輸出與上面第一個版本的輸出相同

如何使用Perl的正則表達式遍歷多行字符串

問題描述

2 個解決方案

解決方案1
4 已采納 2016-06-15 03:35:01

解決方案2
1 2016-06-15 14:22:33

產量

如何使用Perl的正則表達式遍歷多行字符串

問題描述

2 個解決方案

解決方案1 4 已采納 2016-06-15 03:35:01

解決方案2 1 2016-06-15 14:22:33

產量

解決方案1
4 已采納 2016-06-15 03:35:01

解決方案2
1 2016-06-15 14:22:33