简体   繁体   English

perl脚本查找两个文件中匹配的字段

[英]perl script to find fields matching in two files

I have two files, and would like to find matching fields 1 and 2 from two files, and print the third field from the second file when fields 1 and 2 match. 我有两个文件,想从两个文件中找到匹配的字段1和2,并在字段1和2匹配时从第二个文件中打印第三个字段。 File 1 looks like: 文件1如下所示:

#CHR BP                                                                                                          
#1 9690639                                                                                                      
#1 7338706                                                                                                      
#1 7338707                                                                                                      
#1 7338717

File 2 looks like: 文件2如下所示:

#1 10036 rs11928874 CT C 315.21 VQSRTrancheINDEL99.99to100.00AC=3;AF=0.063;AN=48;BaseQRankSum=0.297;DP=1469;FS=16.265;InbreedingCoeff=-0.0941;MLEAC=3;MLEAF=0.063;MQ=14.67;MQ0=0;MQRankSum=1.339

I wrote the following perl script, which outputs far too many lines that do not fit the matching criteria: 我编写了以下perl脚本,该脚本输出的行太多,不符合匹配条件:

my @loci;
open IN, "highalt_results.txt";
while (<IN>) {
    my @L = split;
    next if m/CHR/;
    push @loci, [ $L[0], $L[1] ];
}
close IN;

my $F = shift @ARGV;
open IN, "$F";
while (<IN>) {
    my @L = split;
    next if m/#CHROM/;
    foreach (@loci) {
        if ( $L[0] = ${$_}[0] ) {
            if ( $L[1] = ${$_}[1] ) {
                print "${$_}[0] ${$_}[1] $L[2]\n";
                next;
            }
        }
    }
}

Can someone point out where the script is going wrong? 有人可以指出脚本出了什么问题吗?

I think this will be where your error is: 我认为这是您的错误所在:

    if ( $L[0] = ${$_}[0] ) {
        if ( $L[1] = ${$_}[1] ) {

Equals is an assignment - so will always be true. 等于是一项任务-因此永远都是正确的。 You probably want == . 您可能需要== Or maybe eq for a string based comparison. 或者,对于基于字符串的比较,可以使用eq

More generally - I think there's several things you should be really doing to tighten up your code. 更笼统地说-我认为您应该做些真正的事情来收紧代码。

  • strict and warnings are really good. strictwarnings真的很好。
  • 3 argument open with lexical filehandles is good open ( my $input, "<", $filename ) or die $!; 用词法文件句柄open 3个参数是好open ( my $input, "<", $filename ) or die $!; - this avoids a potential gotcha with the filename specified on @ARGV . -这样可以避免@ARGV指定的文件名可能引起的@ARGV (consider a file called '>/etc/passwd' ) (考虑一个名为'>/etc/passwd'
  • you really should be checking if open was successful. 您确实应该检查open是否成功。
  • And I'd probably suggest not using the implict variable in your foreach loop, as ${$_}[0] isn't particularly nice. 而且我可能建议不要在foreach循环中使用隐式变量,因为${$_}[0]并不是特别好。 Using -> to dereference can make code a lot nicer. 使用->取消引用可以使代码更好。

I would probably rewrite as something like: 我可能会将其重写为:

use strict;
use warnings;

my @loci;
open( my $loci_in, "<", "highalt_results.txt" ) or die $!;
while (<$loci_in>) {
    my ( $start, $end ) = split;
    next if m/CHR/;
    push @loci, [ $start, $end ];
}
close $loci_in;

my $filename = shift @ARGV;
open( my $input, "<", $filename ) or die $!;
while (<$input>) {
    next if m/#CHROM/;
    my ( $start, $end, $data ) = split;
    foreach my $pair (@loci) {
        if (    $start == $pair->[0]
            and $end == $pair->[1] )
        {
            print "$start $end $data\n";

        }
    }
}
close($input);

at least you have errors if ( $L[0] = ${$ }[0] ) { if ( $L[1] = ${$ }[1] ) { 至少如果((L [0] = $ {$ } [0]){{((L [1] = $ {$ } [1])} {

You should use == or -eq for comparison 您应该使用==或-eq进行比较

please clear your data files format. 请清除您的数据文件格式。 I cant see matching fields 我看不到匹配的字段

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM