简体   繁体   English

Perl:在两个文件中匹配数据

[英]Perl: matching data in two files

I would like to match and print data from two files (File1.txt and File2.txt). 我想匹配并打印两个文件(File1.txt和File2.txt)中的数据。 Currently, I'm trying to match the first letter of the second column in File1 to the first letter of the third column in File2.txt. 当前,我正在尝试将File1中第二列的第一个字母与File2.txt中第三列的第一个字母匹配。

File1.txt
1  H  35
1  C  22
1  H  20

File2.txt
A  1 HB2 MET  1 
A  2 CA  MET  1
A  3 HA  MET  1

OUTPUT
1  MET  HB2  35
1  MET  CA   22
1  MET  HA   20 

Here is my script, I've tried following this submission: In Perl, mapping between a reference file and a series of files 这是我的脚本,在提交之后,我尝试了以下操作: 在Perl中,参考文件和一系列文件之间的映射

#!/usr/bin/perl

use strict;
use warnings;

my %data;

open (SHIFTS,"file1.txt") or die;
open (PDB, "file2.txt") or die;

while (my $line = <PDB>) {
    chomp $line;
    my @fields = split(/\t/,$line);
    $data{$fields[4]} = $fields[2];
 }

 close PDB;

 while (my $line = <SHIFTS>) {
    chomp($line);
    my @columns = split(/\t/,$line);
    my $value = ($columns[1] =~ m/^.*?([A-Za-z])/ );
 }
    print "$columns[0]\t$fields[3]\t$value\t$data{$value}\n";

 close SHIFTS;
 exit;

Here's one way using split() hackery: 这是使用split()黑客的一种方法:

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my $f1 = 'file1.txt';
my $f2 = 'file2.txt';

my @pdb;

open my $pdb_file, '<', $f2
  or die "Can't open the PDB file $f2: $!";

while (my $line = <$pdb_file>){
    chomp $line;
    push @pdb, $line; 
}

close $pdb_file;

open my $shifts_file, '<', $f1
  or die "Can't open the SHIFTS file $f1: $!";

while (my $line = <$shifts_file>){

    chomp $line;

    my $pdb_line = shift @pdb;

    # - inner split: get the third element from the $pdb_line
    # - outer split: get the first element (character) from the
    #   result of the inner split

    my $criteria = (split('', (split('\s+', $pdb_line))[2]))[0];

    # - compare the 2nd element of the file1.txt line against
    #   the above split() operations

    if ((split('\s+', $line))[1] eq $criteria){
        print "$pdb_line\n";
    }
    else {
        print "**** >$pdb_line< doesn't match >$line<\n";
    }
}

Files: 档案:

file1.txt (note I changed line two to ensure a non-match worked): file1.txt(请注意,我更改了第二行以确保不匹配有效):

1  H  35
1  A  22
1  H  20

file2.txt: file2.txt:

A  1 HB2 MET  1 
A  2 CA  MET  1
A  3 HA  MET  1

Output: 输出:

./app.pl
A  1 HB2 MET  1 
****>A  2 CA  MET  1< doesn't match >1  A  22<
A  3 HA  MET  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM