简体   繁体   English

如何使用Perl将多行合并为单行

[英]How to use Perl to merge multi-line into single line

I try to use Perl to covert from the input text file format to the output text file format shown, but not successfully. 我尝试使用Perl将输入文本文件格式转换为所示的输出文本文件格式,但没有成功。

Can anyone help? 有人可以帮忙吗?

Input: 输入:

row1 multiline 1
row1 multiline 2
row1 multiline 3
row2 multiline 1
row2 multiline 2

Expected Output: 预期产量:

row1 multiline 1 multiline 2 multiline 3
row2 multiline 1 multiline 2

This will do as you ask. 这将按照您的要求进行。 It checks to see whether the first field on each line has changed to decide whether to continue outputting the current line or to start a new one 它检查每行的第一个字段是否已更改,以决定是继续输出当前行还是开始新行

It expects the path to the input file as a parameter on the command line 它期望输入文件的路径作为命令行上的参数

use strict;
use warnings;

my $row;

while ( <> ) {

    next unless /\S/;
    chomp;

    my ( $new_row, $rest ) = split ' ', $_, 2;

    if ( defined $row and $row eq $new_row ) {
        print ' ', $rest;
    }
    else {
        print "\n" if defined $row;
        print $_;
        $row = $new_row;
    }
}

print "\n";

output 输出

row1 multiline 1 multiline 2 multiline 3
row2 multiline 1 multiline 2

In one regex? 在一个正则表达式中? Not very likely. 不太可能。 The same regex multiple times however is plausible. 但是,多次使用相同的正则表达式是合理的。 Just match against this until it stops matching: 只需与此匹配,直到停止匹配:

while ($input =~ s/row(\d+)((?: multiline \d+)+)\n+row\1/row$1$2/gm){}

The loop will reduce the amount of unmerged lines by half with every iteration. 该循环将在每次迭代中将未合并的行数减少一半。 Hence it will loop only O(ln(n)) times. 因此,它将仅循环O(ln(n))次。

You can see it in action here: https://ideone.com/RP30h6 您可以在此处查看它的运行情况: https : //ideone.com/RP30h6


The above solution is more esoteric then practical. 上面的解决方案比实际更深奥。 Here is how a real solution might look like: 实际的解决方案如下所示:

 my $row_number = 0; my ($row, $column); while ($input =~ /(row(\\d+) multiline (\\d+))/gm) { if ($row_number != $2) { $row_number = $2; } else { $row = $1; $column = $3; $input =~ s/\\n+$row/ multiline $column/g; } } 

Demo: https://ideone.com/Mk2QqZ 演示: https//ideone.com/Mk2QqZ

This can be done using a replacement callback. 这可以使用替换回调来完成。
In Perl, this is typically accomplished by using the s///e evaluation form. 在Perl中,这通常通过使用s///e 评估表来完成。

This just gets the common row block in capture buffers. 这只是获取捕获缓冲区中的公共行块。
Buffer 1 is the first row, buffer 3 is the remaining common row's. 缓冲区1是第一行,缓冲区3是其余的公共行。

These are passed to the merge sub. 这些被传递到合并子。
The merge sub trims out the common row's via another regex, merge子通过另一个正则表达式修剪公共行,
then combines the first row with the common row's. 然后将第一行与普通行合并。
It then gets passed back as a replacement. 然后将其作为替换传递回去。

Perl code: Perl代码:

use strict;
use warnings;

$/ = undef;

my $input = <DATA>;

sub mergeRows {
    my ($first_row, $other_rows) = @_;
    $other_rows =~ s/(?m)\s*^\w+\s*(.*)(?<!\s)\s*/$1 /g;
    return $first_row . " " . $other_rows . "\n";
}

$input =~ s/(?m)(^(\w+).*)(?<!\s)\s+((?:\s*^\2.*)+)/ mergeRows($1,$3) /eg;

print $input, "\n";

__DATA__
row1 multiline 1

row1 multiline 2

row1 multiline 3

row2 multiline 1

row2 multiline 2

Output: 输出:

row1 multiline 1 multiline 2 multiline 3

row2 multiline 1 multiline 2

Main regex: 主要正则表达式:

 (?m)                          # Multi-line mode
 (                             # (1 start), First of common row
      ^ 
      ( \w+ )                       # (2), common row label
      .* 
 )                             # (1 end)
 (?<! \s )                     # Force trim of trailing spaces
 \s+                           # Consume a newline, also get all the next whitespaces
 (                             # (3 start), Remaining common row's
      (?:
           \s* ^ \2  .* 
      )+
 )                             # (3 end)

Merge sub regex: 合并子正则表达式:

 (?m)                          # Multi-line mode
 \s*                           # remove
 ^ \w+ \s*                     # remove
 ( .* )                        # (1), What will be saved
 (?<! \s )                     # remove, force trim of trailing spaces
 \s*                           # remove, possibly many newlines (whitespace)

You have a key field as the first word, and then the rest of the line as a value. 您将键字段作为第一个单词,将行的其余部分作为值。

So I would approach your problem like this: 所以我会这样处理您的问题:

#!/usr/bin/env perl
use strict;
use warnings;

my %rows;
while (<DATA>) {
    my ( $key, $rest_of_line ) = (m/^(\w+) (.*)/);
    push( @{ $rows{$key} }, $rest_of_line );
}

foreach my $key ( sort keys %rows ) {
    print "$key ", join( " ", @{ $rows{$key} } ), "\n";
}

__DATA__
row1 multiline 1
row1 multiline 2
row1 multiline 3
row2 multiline 1
row2 multiline 2

It's slightly different approach to the others, in that we read in each line into a hash, then output the hash. 与其他方法稍有不同,我们将每一行读入一个哈希,然后输出该哈希。

It doesn't maintain the order of your original file, but instead sorts in 'row value' order. 它不维护原始文件的顺序,而是按“行值”顺序排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM