简体   繁体   中英

perl script to find fields matching in two files

I have two files, and would like to find matching fields 1 and 2 from two files, and print the third field from the second file when fields 1 and 2 match. File 1 looks like:

#CHR BP                                                                                                          
#1 9690639                                                                                                      
#1 7338706                                                                                                      
#1 7338707                                                                                                      
#1 7338717

File 2 looks like:

#1 10036 rs11928874 CT C 315.21 VQSRTrancheINDEL99.99to100.00AC=3;AF=0.063;AN=48;BaseQRankSum=0.297;DP=1469;FS=16.265;InbreedingCoeff=-0.0941;MLEAC=3;MLEAF=0.063;MQ=14.67;MQ0=0;MQRankSum=1.339

I wrote the following perl script, which outputs far too many lines that do not fit the matching criteria:

my @loci;
open IN, "highalt_results.txt";
while (<IN>) {
    my @L = split;
    next if m/CHR/;
    push @loci, [ $L[0], $L[1] ];
}
close IN;

my $F = shift @ARGV;
open IN, "$F";
while (<IN>) {
    my @L = split;
    next if m/#CHROM/;
    foreach (@loci) {
        if ( $L[0] = ${$_}[0] ) {
            if ( $L[1] = ${$_}[1] ) {
                print "${$_}[0] ${$_}[1] $L[2]\n";
                next;
            }
        }
    }
}

Can someone point out where the script is going wrong?

I think this will be where your error is:

    if ( $L[0] = ${$_}[0] ) {
        if ( $L[1] = ${$_}[1] ) {

Equals is an assignment - so will always be true. You probably want == . Or maybe eq for a string based comparison.

More generally - I think there's several things you should be really doing to tighten up your code.

  • strict and warnings are really good.
  • 3 argument open with lexical filehandles is good open ( my $input, "<", $filename ) or die $!; - this avoids a potential gotcha with the filename specified on @ARGV . (consider a file called '>/etc/passwd' )
  • you really should be checking if open was successful.
  • And I'd probably suggest not using the implict variable in your foreach loop, as ${$_}[0] isn't particularly nice. Using -> to dereference can make code a lot nicer.

I would probably rewrite as something like:

use strict;
use warnings;

my @loci;
open( my $loci_in, "<", "highalt_results.txt" ) or die $!;
while (<$loci_in>) {
    my ( $start, $end ) = split;
    next if m/CHR/;
    push @loci, [ $start, $end ];
}
close $loci_in;

my $filename = shift @ARGV;
open( my $input, "<", $filename ) or die $!;
while (<$input>) {
    next if m/#CHROM/;
    my ( $start, $end, $data ) = split;
    foreach my $pair (@loci) {
        if (    $start == $pair->[0]
            and $end == $pair->[1] )
        {
            print "$start $end $data\n";

        }
    }
}
close($input);

at least you have errors if ( $L[0] = ${$ }[0] ) { if ( $L[1] = ${$ }[1] ) {

You should use == or -eq for comparison

please clear your data files format. I cant see matching fields

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM