简体   繁体   English

正则表达式匹配不会在 perl 中产生 output

[英]regex match does not produce the output in perl

I have a test file that looks like that:我有一个看起来像这样的测试文件:

t # 3-0, 1
v 0 0
v 1 19
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-1, 1
v 0 0
v 1 15
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-2, 1
v 0 0
v 1 17
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-3, 1
v 0 0
v 1 18
v 2 7
u 0 1 2
u 0 2 2
u 1 2 2

I wrote the following code to match the last three lines of the transaction (each transaction starts with t # )我编写了以下代码来匹配事务的最后三行(每个事务以t #开头)

#!/usr/bin/perl -w
    
use strict;
    
my $input = shift @ARGV or die $!; 
    
open (FILE, "$input") or die $!;

LOOP: while (<FILE>) {
         if (m/^(t\h*#\h*[0-9,\h-]+)/) {
             my $transaction_id = $1;
             while (<FILE>) {
                if (m/^(u\h+[0]\h+[1]\h+[2])/) {
                    my $edge_1 = $1;
                    while (<FILE>) {
                        if (m/^(u\h+[0]\h+[2]\h+[2])/) {
                            my $edge_2 = $1;
                            while (<FILE>) {
                                if (m/^(u\h+[1]\h+[2]\h+[2])/) {
                                    my $edge_3 = $1;
                                    print $transaction_id . "\t" . $edge_1 . "\t" . $edge_2 . "\t" . $edge_3 . "\n";
                                    next LOOP;
                                         }
                                     }
                                 }
                             }
                         }
                     }
                 }
             }
    
close FILE;

However, it does not print any results.但是,它不会打印任何结果。 When I compile my program, it runs without errors.当我编译我的程序时,它运行没有错误。 My ultimate goal is to produce output like this, where I output edges of subgraphs "u 0 1 2", "u 0 2 2" and "u 1 2 2":我的最终目标是像这样生成 output,其中我 output 子图“u 0 1 2”、“u 0 2 2”和“u 1 2 2”的边缘:

t # 3-0, 1   u 0 1 2   u 0 2 2   u 1 2 2
t # 3-1, 1   u 0 1 2   u 0 2 2   u 1 2 2
t # 3-2, 1   u 0 1 2   u 0 2 2   u 1 2 2
t # 3-3, 1   u 0 1 2   u 0 2 2   u 1 2 2

One way: Keep all lines for a transaction in a buffer, and when you get to a new transaction id store the previous one, along with the last three lines from that buffer一种方法:将事务的所有行保存在缓冲区中,当您到达新事务 id 时,将前一个以及该缓冲区中的最后三行存储起来

use warnings;
use strict;
use feature 'say';

my (@transactions, @trans_lines, $tid);

while (<>) { 
    chomp;

    if (/^(t\s*#\s*[0-9,\s-]+)/) { 
        if (not $tid) {      
            $tid = $1;   # the very first one starts
            next;
        }

        # Store previous id and its last three lines, reset
        push @transactions, [ $tid, @trans_lines[-3..-1] ];
        $tid = $1; 
        @trans_lines = ()
    }   

    push @trans_lines, $_; 
}


say "@$_" for @transactions;

This stores all transactions in an array, so they are easily iterated and maintain the order from the file.这将所有事务存储在一个数组中,因此它们很容易迭代并维护文件中的顺序。 This supports the use of results demonstrated in the question.这支持使用问题中展示的结果。 But with an array one can't easily refer to a particular one, and if it is of interest to be able to look up particular id's consider using a hash of array references instead, like in the related problem .但是对于一个数组,不能轻易地引用一个特定的数组,如果有兴趣能够查找特定的 id,请考虑使用数组引用的 hash 代替,就像在相关问题中一样。

The above code relies on there always being three lines in a transaction, as implicit in the question.上面的代码依赖于事务中始终存在三行,正如问题中所隐含的那样。 I'd recommend adding a check.我建议添加支票。

The construct while (<>) reads lines of all files given on the command line, or STDIN .构造while (<>)读取命令行或STDIN上给出的所有文件的行。


Some comments on the posted code对已发布代码的一些评论

  • The use warnings; use warnings; is better than using -w switch比使用-w开关更好

  • The $! $! variable holds the error string.变量保存错误字符串。 While it should indeed be used ubiquitously, if @ARGV is empty the shift returns an undef and there is no error;虽然它确实应该普遍使用,但如果@ARGV为空,则shift返回undef并且没有错误; so $!所以$! is not set.未设置。 Instead, do something like相反,做类似的事情

    my $file = shift @ARGV // die "Usage: $0 file\n";

    or, better yet, invoke your routine with a fuller usage message, etc.或者,更好的是,使用更完整的使用消息等来调用您的例程。

  • Use lexical filehandles , open my $fh, '<', $file or die $;;使用词法文件句柄open my $fh, '<', $file or die $;; , as they are plainly better in multiple ways than globs ( FH ) ,因为它们在多个方面明显优于 glob ( FH )

  • There is no need to double-quote a lone scalar variable, as it will get evaluated anyway (while excessive quoting can even lead to subtle problems in some situations)没有必要双引号一个单独的标量变量,因为它无论如何都会被评估(而过度引用甚至在某些情况下会导致微妙的问题)

  • Nesting loops that read from the same resource (filehandle here) is legitimate and has its uses, but it adds a layer of complexity and makes the code harder to track.从同一资源(此处为文件句柄)读取的嵌套循环是合法的并且有其用途,但它增加了一层复杂性并使代码更难跟踪。 I'd use it very, very sparingly.我会非常非常谨慎地使用它。 Multiple levels of nesting add that much more complexity.多层次的嵌套增加了更多的复杂性。

I don't readily see why the code in the questions doesn't work.我不明白为什么问题中的代码不起作用。 Add printing statements?添加打印报表?

Your code as it is gives me this output:你的代码给了我这个 output:

t # 3-0, 1  u 0 1 2 u 0 2 2 u 1 2 2
t # 3-1, 1  u 0 1 2 u 0 2 2 u 1 2 2
t # 3-2, 1  u 0 1 2 u 0 2 2 u 1 2 2
t # 3-3, 1  u 0 1 2 u 0 2 2 u 1 2 2

So it seems the problem is with something that you haven't shown us.所以看来问题出在你没有向我们展示的东西上。 Perhaps the input file comes from a different system and has line endings that your system doesn't recognise.也许输入文件来自不同的系统,并且具有您的系统无法识别的行尾。

Your nested while loops and if conditions make the code more complex than it needs to be (and, therefore, harder to maintain).您的嵌套while循环和if条件使代码比它需要的更复杂(因此更难维护)。 You can do it all in one loop using something like this:您可以使用以下方法在一个循环中完成所有操作:

#!/usr/bin/perl

use strict;
use warnings;

my $input = shift @ARGV or die $!;

open (my $fh, '<', $input) or die $!;

my ($transaction_id, $edge_1, $edge_2, $edge_3);

while (<$fh>) {
  if (m/^(t\h*#\h*[0-9,\h-]+)/) {
    $transaction_id = $1;
  } elsif (m/^(u\h+[0]\h+[1]\h+[2])/) {
    $edge_1 = $1;
  } elsif (m/^(u\h+[0]\h+[2]\h+[2])/) {
    $edge_2 = $1;
  } elsif (m/^(u\h+[1]\h+[2]\h+[2])/) {
    $edge_3 = $1;
  }

  if ($transaction_id and $edge_1 and $edge_2 and $edge_3) {
    print "$transaction_id\t$edge_1\t$edge_2\t$edge_3\n";
    ($transaction_id, $edge_1, $edge_2, $edge_3) = (undef) x 4;
  }
}

(Note, I've also replaced -w with use warnings and switched to using a lexical filehandle and the three-arg version of open() . All of these are Modern Perl best practices.) (注意,我还用use warnings替换了-w并切换到使用词法文件句柄和open()的三参数版本。所有这些都是现代 Perl 最佳实践。)

Would you please try the following:请您尝试以下方法:

#!/usr/bin/perl -w

my $ref;
open(FH, shift) or die;
while (<FH>) {
    chop;
    if (/^t\s*#/) {            # if a new transaction starts
        $ref = [];             # then create a new reference to an array
        push(@refs, $ref);     # and memorize the reference
    }
    push(@$ref, $_);           # append the line to the current array
}
for $ref (@refs) {
    print(join(" " x 4, $ref->[0], $ref->[-3], $ref->[-2], $ref->[-1]), "\n");
}

Output: Output:

t # 3-0, 1    u 0 1 2    u 0 2 2    u 1 2 2
t # 3-1, 1    u 0 1 2    u 0 2 2    u 1 2 2
t # 3-2, 1    u 0 1 2    u 0 2 2    u 1 2 2
t # 3-3, 1    u 0 1 2    u 0 2 2    u 1 2 2

Define regex patterns $skip , $data and $tran , walk through data, assemble transaction line and push into array when new transaction starts定义正则表达式模式$skip$data$tran ,遍历数据,组装交易行并在新交易开始时推入数组

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my $skip = qr/^v \d+ \d+/;
my $data = qr/^u \d+ \d+ \d+/;
my $tran = qr/^t # \d-\d, \d/;

my @array;
my $line = <DATA>;

chomp($line);

 while( <DATA> ) {
    next if /$skip/;
    chomp;
    $line .= '  ' . $_ if /$data/;
    if( /$tran/ ) {
        push @array, $line;
        $line = $_;
    }
}

push @array, $line;

say Dumper(\@array);

__DATA__
t # 3-0, 1
v 0 0
v 1 19
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-1, 1
v 0 0
v 1 15
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-2, 1
v 0 0
v 1 17
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-3, 1
v 0 0
v 1 18
v 2 7
u 0 1 2
u 0 2 2
u 1 2 2

Output Output

$VAR1 = [
          't # 3-0, 1  u 0 1 2  u 0 2 2  u 1 2 2',
          't # 3-1, 1  u 0 1 2  u 0 2 2  u 1 2 2',
          't # 3-2, 1  u 0 1 2  u 0 2 2  u 1 2 2',
          't # 3-3, 1  u 0 1 2  u 0 2 2  u 1 2 2'
        ];

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM