Perl-将正则表达式之间的行推入数组的一个元素

Question

这是我正在处理的日志文件-

|
blah1a
blah1b
blah1c
|
****blahnothing1
|
blah2a
blah2b
blah2c
|
blahnothing2
|
blah3a
blah3b
blah3c
|
blahnothing3

我需要的信息位于两个管道字符之间。 有很多以星号开头的行，我跳过它们。 每行都有Windows行尾字符。 这些竖线字符之间的数据是连续的，但是在linux主机上读取时，它们会被windows换行符砍掉。 我在两行之间用范围运算符编写了perl脚本，希望以管道定界符开头的所有内容都被压入数组元素，然后在下一个管道定界符处停止，然后再次开始。 每个数组元素将在两个管道字符之间包含所有行。

理想情况下，如果没有Windows控件字符，则数组将看起来像这样。

$lines[0] blah1a blah1b blah1c
$lines[1] blah2a blah2b blah2c
$lines[2] blah3a blah3b blah3c

但是，每个数组看起来都不像那样。

#!/usr/bin/perl

use strict ;
use warnings ;

my $delimiter = "|";
my $filename = $ARGV[0] ;
my @lines ;
open(my $fh, '<:encoding(UTF-8)' , $filename) or die "could not open file $filename $!";

while (my $line = readline $fh) {
    next if ($line =~/^\*+/) ;
    if ($line =~ /$delimiter/ ... $line =~/$delimiter/) {
    push (@lines, $line) ;
    }


}

print  $lines[0] ;
print  $lines[1] ;
print  $lines[2] ;

Answer 1

这似乎满足您的要求

我已经将blahnothing2和blahnothing3这两条线blahnothing2在原处，因为我看不到删除它们的理由

\\R regex模式是通用换行符 ，它匹配来自任何平台（例如CR，LF或CRLF）的换行符序列

use strict;
use warnings 'all';

my $data = do {
    open my $fh, '<:raw', 'blah.txt' or die $!;
    local $/;
    <$fh>;
};

$data =~ s/^\s*\*.*\R/ /gm; # Remove lines starting with *
$data =~ s/\R/ /g;          # Change all line endings to spaces

# Split on pipe and remove blank elements
my @data = grep /\S/, split /\s*\|\s*/, $data; 

use Data::Dump;
dd \@data;

产量

[
  "blah1a blah1b blah1c",
  "blah2a blah2b blah2c",
  "blahnothing2",
  "blah3a blah3b blah3c",
  "blahnothing3 ",
]

Answer 2

似乎要合并|之间的行 ，转换为字符串，然后将其放置在数组上。

一种方法是设置| 作为输入记录分隔符，因此每次读取管道之间的块

{  # localize the change to $/

    local $/ = "|";
    open(my $fh, '<:encoding(UTF-8)' , $filename) 
        or die "could not open file $filename $!";

    my @records;
    while (my $section = <$fh>)
    {
        next if $section =~ /^\s*\*/;  
        chomp $section;                # remove the record separator (| here)
        $section =~ s/\R/ /g;          # clean up newlines
        $section =~ s/^\s*//;          # clean up leading spaces
        push @records, $section if $section;
    }
    print "$_\n" for @records;
}

如果以“ * （和可选空格）开头的“节”，我将跳过。 可能会有更多限制性版本。 $section最终可能是空字符串，因此我们有条件地push其push数组。

输出，将问题中的示例复制粘贴到具有$filename的输入文件中

blah1a blah1b blah1c 
blah2a blah2b blah2c 
blahnothing2 
blah3a blah3b blah3c 
blahnothing3

问题中的方法很好，但是您需要合并“部分”（在管道之间）内的行，并将每个这样的字符串放置在数组上。 因此，您需要一个标志来跟踪输入/离开部分的时间。

Perl-将正则表达式之间的行推入数组的一个元素

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-03-02 20:32:33

产量

解决方案2
1 2017-03-02 20:10:12

Perl-将正则表达式之间的行推入数组的一个元素

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-03-02 20:32:33

产量

解决方案2 1 2017-03-02 20:10:12

解决方案1
2 已采纳 2017-03-02 20:32:33

解决方案2
1 2017-03-02 20:10:12