简体   繁体   中英

retrieve patterns that exactly match all regex in Perl

I have a database of subgraphs that looks like this:

t # 3-231, 1
v 0 94
v 1 14
v 2 16
v 3 17
u 0 1 2
u 0 2 2
u 0 3 2
t # 3-232, 1
v 0 14
v 1 94
v 2 19
v 3 91
u 0 1 2
u 0 3 2
u 1 2 2
t # 3-233, 1
v 0 17
v 1 91
v 2 16
v 3 94
u 0 1 2
u 0 3 2
u 1 2 2
t # 3-234, 1
v 0 90
v 1 93
v 2 102
v 3 95
u 0 1 2
u 0 3 2
u 1 2 2

I would like to retrieve all transactions that contains the following patterns: 'u 0 1 2' and 'u 0 2 2' along with transaction id (ex. line starts with t #).

I used the following code to accomplish this job:

#!/usr/bin/perl -w

use strict;

my $input = shift @ARGV or die $!; 

open (FILE, "$input") or die $!;

while (<FILE>) {

my @fields = ('t', 'u\ 0\ 1', 'u\ 0\ 2');  
my $field_regex = join( "|", @fields );
my @field_lines;

    push( @field_lines, $_ ) if ( /^(?:$field_regex) / );
    last if @field_lines == @fields;

push @field_lines, "";

print join( "\n", sort @field_lines );
}

close FILE;

However, it retrieves patterns, when only one line match, such as:

t # 3-231, 1
u 0 1 2
u 0 2 2
t # 3-232, 1
u 0 1 2
t # 3-233, 1
u 0 1 2
t # 3-233, 1
u 0 1 2

My ultimate goal is to retrieve transactions that completely match to my regex, such as

t # 3-231, 1
u 0 1 2
u 0 2 2

Thank you for your help!

Olha

One way: keep the current transaction-id on hand, and store lines of interest in an arrayref associated with that transaction-id key in a hash.

use warnings;
use strict;
use feature 'say';    
use Data::Dump qw(dd);

my @fields = ('u 0 1', 'u 0 2');  
my $field_regex = join '|', map { quotemeta } @fields;
    
my (%trans, $tid);

while (<>) {
    chomp;
    if (/^t #/) { 
        $tid = $_; 
        next;
    }   
  
    push @{$trans{$tid}}, $_  if /$field_regex/;
}

dd %trans;

# foreach my $tid (sort keys %trans) { 
#     say $tid;
#     say for @{$trans{$tid}};
# }

I use while (<>) which reads line by line all files given on command line when the program is invoked (or STDIN ), for simplicity here. I use Data::Dump to show a complex data structure; there is Data::Dumper in the core for that.

The qotemeta escapes all ASCI non-"word" characters, that can throw off regex, and this includes spaces.

The program above in general loses the order of transaction-id's from the file, since hash keys are unordered, while it keeps the order of lines for each id since those are on an array. This is not hard to remedy if needed.

Tested only with the provided data file.

This type of pattern is most easily handled by treating the file contents as blocks rather than lines.

Here is an easy example (with your data):

use strict;

my $big_string;
my $trans;
my $block;

open my $fh, '<', '/tmp/file.txt' or die "Can't open file $!";

$big_string = do { local $/; <$fh> };

while ($big_string=~/^(t\h*#\h*[0-9,\h-]+[\s\S]*?(?=(?:^t\h*#\h*[0-9,\h-]+)|\z))/mg) {
    $block=$1;
    $trans=$1 if $block=~/^(t\h*#\h*[0-9,\h-]+)/;
    if ($block=~/^(u\h+0\h+[12]\h+2)/m){
        print "$trans\n";
        for ($block=~/^(u\h+0\h+[12]\h+2)/mg) {
            print "$1\n";
        }
    }
}   

Prints:

t # 3-231, 1
u 0 2 2
u 0 2 2
t # 3-232, 1
u 0 1 2
t # 3-233, 1
u 0 1 2
t # 3-234, 1
u 0 1 2

This assumes that your data fits easily into memory. If not, there are many ways to read a file block by block as well.

If you only want the blocks that have more than one match to the second regex:

while ($big_string=~/^(t\h*#\h*[0-9,\h-]+[\s\S]*?(?=(?:^t\h*#\h*[0-9,\h-]+)|\z))/mg) {
    $block=$1;
    $trans=$1 if $block=~/^(t\h*#\h*[0-9,\h-]+)/;
    @matches=$block=~/^(u\h+0\h+[12]\h+2)/mg;
    if (scalar @matches >1) {
        print "$trans\n";
        print join "\n", @matches;
    }
}   

Prints:

t # 3-231, 1
u 0 1 2
u 0 2 2

And, of course, TIMTOWDI:

my @result = do{ local @_ = split(/^(t\h+#[\h\d,-]+)/m, $big_string); 
                        @_[1..$#_]};

for my $i (0 .. @result/2-1) {
    @matches=$result[2*$i+1]=~/^(u\h+0\h+[12]\h+2)/mg;
    if (scalar @matches>1){
        print "$result[2*$i]\n";
        print join("\n", @matches);
    }
}   
t # 3-231, 1
u 0 1 2
u 0 2 2

perl -lne '@h=($_) if /^t #/; push @h,$_ if /^u 0 [12] 2/; if (@h==3) { print shift @h while @h }' file

reset & hold the transaction line; append the matching lines; print and reset if you accumulate 3 lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM