简体   繁体   中英

Perl: match regex from the file

I have a tab-delimited file that contains information about itemsets. Each itemset consists of one to three items:

MTMR14_Q1   NOTCH1_Q3   PRKCD_Q1        
MTMR14_Q1   NOTCH1_Q3   TFRC_Q3     
MTMR14_Q1   NOTCH1_Q3           
MTMR14_Q1           
MTMR14_Q1   PASD1_Q3

My goal is to retrieve itemsets with three items only:

MTMR14_Q1   NOTCH1_Q3   PRKCD_Q1        
MTMR14_Q1   NOTCH1_Q3   TFRC_Q3 

I have wrote the following code, but it does not retrieve any itemsets:

#!/usr/bin/perl -w

use strict;

my $input = shift @ARGV or die $!; 

open (FILE, "$input") or die $!;
while (<FILE>) {
        my $seq = $_;
        chomp $seq;
        
if ($seq =~ /[A-Z]\t[A-Z]\t[A-Z]/) {  
#using the binding operator to match a string to a regular expression
    
print $seq . "\n";
    }
}
close FILE;

Could you, please, pinpoint my error?

Thank you! Olha

[AZ] matches a single letter.


Skip lines that don't contain exactly 3 fields:

next if $seq !~ /^ [^\t]* \t [^\t]* \t [^\t]* \z/x;

[^\t]* matches any number of non-tab characters.


Skip lines that don't contain exactly 3 non-empty fields:

next if $seq !~ /^ [^\t]+ \t [^\t]+ \t [^\t]+ \z/x;

[^\t]+ matches any one-or-more non-tab characters.


Presumably, you'll be following up by parsing the lines to get the three fields. If so, you could parse first and check after, like the following does:

my @fields = split /\t/, $seq, -1;

next if @fields != 3;                    # Require exactly 3 fields.

next if ( grep length, @fields ) != 3;   # Requite exactly 3 non-empty fields.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM