简体   繁体   English

检索与 Perl 中的所有正则表达式完全匹配的模式

[英]retrieve patterns that exactly match all regex in Perl

I have a database of subgraphs that looks like this:我有一个如下所示的子图数据库:

t # 3-231, 1
v 0 94
v 1 14
v 2 16
v 3 17
u 0 1 2
u 0 2 2
u 0 3 2
t # 3-232, 1
v 0 14
v 1 94
v 2 19
v 3 91
u 0 1 2
u 0 3 2
u 1 2 2
t # 3-233, 1
v 0 17
v 1 91
v 2 16
v 3 94
u 0 1 2
u 0 3 2
u 1 2 2
t # 3-234, 1
v 0 90
v 1 93
v 2 102
v 3 95
u 0 1 2
u 0 3 2
u 1 2 2

I would like to retrieve all transactions that contains the following patterns: 'u 0 1 2' and 'u 0 2 2' along with transaction id (ex. line starts with t #).我想检索包含以下模式的所有事务:“u 0 1 2”和“u 0 2 2”以及事务 ID(例如,行以 t # 开头)。

I used the following code to accomplish this job:我使用以下代码来完成这项工作:

#!/usr/bin/perl -w

use strict;

my $input = shift @ARGV or die $!; 

open (FILE, "$input") or die $!;

while (<FILE>) {

my @fields = ('t', 'u\ 0\ 1', 'u\ 0\ 2');  
my $field_regex = join( "|", @fields );
my @field_lines;

    push( @field_lines, $_ ) if ( /^(?:$field_regex) / );
    last if @field_lines == @fields;

push @field_lines, "";

print join( "\n", sort @field_lines );
}

close FILE;

However, it retrieves patterns, when only one line match, such as:但是,当只有一行匹配时,它会检索模式,例如:

t # 3-231, 1
u 0 1 2
u 0 2 2
t # 3-232, 1
u 0 1 2
t # 3-233, 1
u 0 1 2
t # 3-233, 1
u 0 1 2

My ultimate goal is to retrieve transactions that completely match to my regex, such as我的最终目标是检索与我的正则表达式完全匹配的事务,例如

t # 3-231, 1
u 0 1 2
u 0 2 2

Thank you for your help!谢谢您的帮助!

Olha奥尔哈

One way: keep the current transaction-id on hand, and store lines of interest in an arrayref associated with that transaction-id key in a hash.一种方法:保留当前的事务 ID,并将感兴趣的行存储在与 hash 中的事务 ID 键关联的数组引用中。

use warnings;
use strict;
use feature 'say';    
use Data::Dump qw(dd);

my @fields = ('u 0 1', 'u 0 2');  
my $field_regex = join '|', map { quotemeta } @fields;
    
my (%trans, $tid);

while (<>) {
    chomp;
    if (/^t #/) { 
        $tid = $_; 
        next;
    }   
  
    push @{$trans{$tid}}, $_  if /$field_regex/;
}

dd %trans;

# foreach my $tid (sort keys %trans) { 
#     say $tid;
#     say for @{$trans{$tid}};
# }

I use while (<>) which reads line by line all files given on command line when the program is invoked (or STDIN ), for simplicity here.为了简单起见,我使用while (<>)在调用程序(或STDIN )时逐行读取命令行上给出的所有文件。 I use Data::Dump to show a complex data structure;我使用Data::Dump来展示一个复杂的数据结构; there is Data::Dumper in the core for that.为此,核心中有Data::Dumper

The qotemeta escapes all ASCI non-"word" characters, that can throw off regex, and this includes spaces. qotemeta转义所有 ASCI 非“单词”字符,这些字符可能会抛出正则表达式,其中包括空格。

The program above in general loses the order of transaction-id's from the file, since hash keys are unordered, while it keeps the order of lines for each id since those are on an array.上面的程序通常会丢失文件中事务 ID 的顺序,因为 hash 键是无序的,而它保持每个 id 的行顺序,因为它们在数组上。 This is not hard to remedy if needed.如果需要,这并不难补救。

Tested only with the provided data file.仅使用提供的数据文件进行测试。

This type of pattern is most easily handled by treating the file contents as blocks rather than lines.通过将文件内容视为块而不是行来最容易处理这种类型的模式。

Here is an easy example (with your data):这是一个简单的示例(使用您的数据):

use strict;

my $big_string;
my $trans;
my $block;

open my $fh, '<', '/tmp/file.txt' or die "Can't open file $!";

$big_string = do { local $/; <$fh> };

while ($big_string=~/^(t\h*#\h*[0-9,\h-]+[\s\S]*?(?=(?:^t\h*#\h*[0-9,\h-]+)|\z))/mg) {
    $block=$1;
    $trans=$1 if $block=~/^(t\h*#\h*[0-9,\h-]+)/;
    if ($block=~/^(u\h+0\h+[12]\h+2)/m){
        print "$trans\n";
        for ($block=~/^(u\h+0\h+[12]\h+2)/mg) {
            print "$1\n";
        }
    }
}   

Prints:印刷:

t # 3-231, 1
u 0 2 2
u 0 2 2
t # 3-232, 1
u 0 1 2
t # 3-233, 1
u 0 1 2
t # 3-234, 1
u 0 1 2

This assumes that your data fits easily into memory.这假设您的数据很容易适合 memory。 If not, there are many ways to read a file block by block as well.如果没有,也有很多方法可以逐块读取文件。

If you only want the blocks that have more than one match to the second regex:如果您只想要与第二个正则表达式有多个匹配的块:

while ($big_string=~/^(t\h*#\h*[0-9,\h-]+[\s\S]*?(?=(?:^t\h*#\h*[0-9,\h-]+)|\z))/mg) {
    $block=$1;
    $trans=$1 if $block=~/^(t\h*#\h*[0-9,\h-]+)/;
    @matches=$block=~/^(u\h+0\h+[12]\h+2)/mg;
    if (scalar @matches >1) {
        print "$trans\n";
        print join "\n", @matches;
    }
}   

Prints:印刷:

t # 3-231, 1
u 0 1 2
u 0 2 2

And, of course, TIMTOWDI:当然,还有 TIMTOWDI:

my @result = do{ local @_ = split(/^(t\h+#[\h\d,-]+)/m, $big_string); 
                        @_[1..$#_]};

for my $i (0 .. @result/2-1) {
    @matches=$result[2*$i+1]=~/^(u\h+0\h+[12]\h+2)/mg;
    if (scalar @matches>1){
        print "$result[2*$i]\n";
        print join("\n", @matches);
    }
}   
t # 3-231, 1
u 0 1 2
u 0 2 2

perl -lne '@h=($_) if /^t #/; push @h,$_ if /^u 0 [12] 2/; if (@h==3) { print shift @h while @h }' file

reset & hold the transaction line;重置并保留交易线; append the matching lines; append 配套线; print and reset if you accumulate 3 lines.如果累积 3 行,则打印并重置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM