简体   繁体   中英

regex match does not produce the output in perl

I have a test file that looks like that:

t # 3-0, 1
v 0 0
v 1 19
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-1, 1
v 0 0
v 1 15
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-2, 1
v 0 0
v 1 17
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-3, 1
v 0 0
v 1 18
v 2 7
u 0 1 2
u 0 2 2
u 1 2 2

I wrote the following code to match the last three lines of the transaction (each transaction starts with t # )

#!/usr/bin/perl -w
    
use strict;
    
my $input = shift @ARGV or die $!; 
    
open (FILE, "$input") or die $!;

LOOP: while (<FILE>) {
         if (m/^(t\h*#\h*[0-9,\h-]+)/) {
             my $transaction_id = $1;
             while (<FILE>) {
                if (m/^(u\h+[0]\h+[1]\h+[2])/) {
                    my $edge_1 = $1;
                    while (<FILE>) {
                        if (m/^(u\h+[0]\h+[2]\h+[2])/) {
                            my $edge_2 = $1;
                            while (<FILE>) {
                                if (m/^(u\h+[1]\h+[2]\h+[2])/) {
                                    my $edge_3 = $1;
                                    print $transaction_id . "\t" . $edge_1 . "\t" . $edge_2 . "\t" . $edge_3 . "\n";
                                    next LOOP;
                                         }
                                     }
                                 }
                             }
                         }
                     }
                 }
             }
    
close FILE;

However, it does not print any results. When I compile my program, it runs without errors. My ultimate goal is to produce output like this, where I output edges of subgraphs "u 0 1 2", "u 0 2 2" and "u 1 2 2":

t # 3-0, 1   u 0 1 2   u 0 2 2   u 1 2 2
t # 3-1, 1   u 0 1 2   u 0 2 2   u 1 2 2
t # 3-2, 1   u 0 1 2   u 0 2 2   u 1 2 2
t # 3-3, 1   u 0 1 2   u 0 2 2   u 1 2 2

One way: Keep all lines for a transaction in a buffer, and when you get to a new transaction id store the previous one, along with the last three lines from that buffer

use warnings;
use strict;
use feature 'say';

my (@transactions, @trans_lines, $tid);

while (<>) { 
    chomp;

    if (/^(t\s*#\s*[0-9,\s-]+)/) { 
        if (not $tid) {      
            $tid = $1;   # the very first one starts
            next;
        }

        # Store previous id and its last three lines, reset
        push @transactions, [ $tid, @trans_lines[-3..-1] ];
        $tid = $1; 
        @trans_lines = ()
    }   

    push @trans_lines, $_; 
}


say "@$_" for @transactions;

This stores all transactions in an array, so they are easily iterated and maintain the order from the file. This supports the use of results demonstrated in the question. But with an array one can't easily refer to a particular one, and if it is of interest to be able to look up particular id's consider using a hash of array references instead, like in the related problem .

The above code relies on there always being three lines in a transaction, as implicit in the question. I'd recommend adding a check.

The construct while (<>) reads lines of all files given on the command line, or STDIN .


Some comments on the posted code

  • The use warnings; is better than using -w switch

  • The $! variable holds the error string. While it should indeed be used ubiquitously, if @ARGV is empty the shift returns an undef and there is no error; so $! is not set. Instead, do something like

    my $file = shift @ARGV // die "Usage: $0 file\n";

    or, better yet, invoke your routine with a fuller usage message, etc.

  • Use lexical filehandles , open my $fh, '<', $file or die $;; , as they are plainly better in multiple ways than globs ( FH )

  • There is no need to double-quote a lone scalar variable, as it will get evaluated anyway (while excessive quoting can even lead to subtle problems in some situations)

  • Nesting loops that read from the same resource (filehandle here) is legitimate and has its uses, but it adds a layer of complexity and makes the code harder to track. I'd use it very, very sparingly. Multiple levels of nesting add that much more complexity.

I don't readily see why the code in the questions doesn't work. Add printing statements?

Your code as it is gives me this output:

t # 3-0, 1  u 0 1 2 u 0 2 2 u 1 2 2
t # 3-1, 1  u 0 1 2 u 0 2 2 u 1 2 2
t # 3-2, 1  u 0 1 2 u 0 2 2 u 1 2 2
t # 3-3, 1  u 0 1 2 u 0 2 2 u 1 2 2

So it seems the problem is with something that you haven't shown us. Perhaps the input file comes from a different system and has line endings that your system doesn't recognise.

Your nested while loops and if conditions make the code more complex than it needs to be (and, therefore, harder to maintain). You can do it all in one loop using something like this:

#!/usr/bin/perl

use strict;
use warnings;

my $input = shift @ARGV or die $!;

open (my $fh, '<', $input) or die $!;

my ($transaction_id, $edge_1, $edge_2, $edge_3);

while (<$fh>) {
  if (m/^(t\h*#\h*[0-9,\h-]+)/) {
    $transaction_id = $1;
  } elsif (m/^(u\h+[0]\h+[1]\h+[2])/) {
    $edge_1 = $1;
  } elsif (m/^(u\h+[0]\h+[2]\h+[2])/) {
    $edge_2 = $1;
  } elsif (m/^(u\h+[1]\h+[2]\h+[2])/) {
    $edge_3 = $1;
  }

  if ($transaction_id and $edge_1 and $edge_2 and $edge_3) {
    print "$transaction_id\t$edge_1\t$edge_2\t$edge_3\n";
    ($transaction_id, $edge_1, $edge_2, $edge_3) = (undef) x 4;
  }
}

(Note, I've also replaced -w with use warnings and switched to using a lexical filehandle and the three-arg version of open() . All of these are Modern Perl best practices.)

Would you please try the following:

#!/usr/bin/perl -w

my $ref;
open(FH, shift) or die;
while (<FH>) {
    chop;
    if (/^t\s*#/) {            # if a new transaction starts
        $ref = [];             # then create a new reference to an array
        push(@refs, $ref);     # and memorize the reference
    }
    push(@$ref, $_);           # append the line to the current array
}
for $ref (@refs) {
    print(join(" " x 4, $ref->[0], $ref->[-3], $ref->[-2], $ref->[-1]), "\n");
}

Output:

t # 3-0, 1    u 0 1 2    u 0 2 2    u 1 2 2
t # 3-1, 1    u 0 1 2    u 0 2 2    u 1 2 2
t # 3-2, 1    u 0 1 2    u 0 2 2    u 1 2 2
t # 3-3, 1    u 0 1 2    u 0 2 2    u 1 2 2

Define regex patterns $skip , $data and $tran , walk through data, assemble transaction line and push into array when new transaction starts

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my $skip = qr/^v \d+ \d+/;
my $data = qr/^u \d+ \d+ \d+/;
my $tran = qr/^t # \d-\d, \d/;

my @array;
my $line = <DATA>;

chomp($line);

 while( <DATA> ) {
    next if /$skip/;
    chomp;
    $line .= '  ' . $_ if /$data/;
    if( /$tran/ ) {
        push @array, $line;
        $line = $_;
    }
}

push @array, $line;

say Dumper(\@array);

__DATA__
t # 3-0, 1
v 0 0
v 1 19
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-1, 1
v 0 0
v 1 15
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-2, 1
v 0 0
v 1 17
v 2 2
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-3, 1
v 0 0
v 1 18
v 2 7
u 0 1 2
u 0 2 2
u 1 2 2

Output

$VAR1 = [
          't # 3-0, 1  u 0 1 2  u 0 2 2  u 1 2 2',
          't # 3-1, 1  u 0 1 2  u 0 2 2  u 1 2 2',
          't # 3-2, 1  u 0 1 2  u 0 2 2  u 1 2 2',
          't # 3-3, 1  u 0 1 2  u 0 2 2  u 1 2 2'
        ];

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM