简体   繁体   中英

Perl: matching data in two files

I would like to match and print data from two files (File1.txt and File2.txt). Currently, I'm trying to match the first letter of the second column in File1 to the first letter of the third column in File2.txt.

File1.txt
1  H  35
1  C  22
1  H  20

File2.txt
A  1 HB2 MET  1 
A  2 CA  MET  1
A  3 HA  MET  1

OUTPUT
1  MET  HB2  35
1  MET  CA   22
1  MET  HA   20 

Here is my script, I've tried following this submission: In Perl, mapping between a reference file and a series of files

#!/usr/bin/perl

use strict;
use warnings;

my %data;

open (SHIFTS,"file1.txt") or die;
open (PDB, "file2.txt") or die;

while (my $line = <PDB>) {
    chomp $line;
    my @fields = split(/\t/,$line);
    $data{$fields[4]} = $fields[2];
 }

 close PDB;

 while (my $line = <SHIFTS>) {
    chomp($line);
    my @columns = split(/\t/,$line);
    my $value = ($columns[1] =~ m/^.*?([A-Za-z])/ );
 }
    print "$columns[0]\t$fields[3]\t$value\t$data{$value}\n";

 close SHIFTS;
 exit;

Here's one way using split() hackery:

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my $f1 = 'file1.txt';
my $f2 = 'file2.txt';

my @pdb;

open my $pdb_file, '<', $f2
  or die "Can't open the PDB file $f2: $!";

while (my $line = <$pdb_file>){
    chomp $line;
    push @pdb, $line; 
}

close $pdb_file;

open my $shifts_file, '<', $f1
  or die "Can't open the SHIFTS file $f1: $!";

while (my $line = <$shifts_file>){

    chomp $line;

    my $pdb_line = shift @pdb;

    # - inner split: get the third element from the $pdb_line
    # - outer split: get the first element (character) from the
    #   result of the inner split

    my $criteria = (split('', (split('\s+', $pdb_line))[2]))[0];

    # - compare the 2nd element of the file1.txt line against
    #   the above split() operations

    if ((split('\s+', $line))[1] eq $criteria){
        print "$pdb_line\n";
    }
    else {
        print "**** >$pdb_line< doesn't match >$line<\n";
    }
}

Files:

file1.txt (note I changed line two to ensure a non-match worked):

1  H  35
1  A  22
1  H  20

file2.txt:

A  1 HB2 MET  1 
A  2 CA  MET  1
A  3 HA  MET  1

Output:

./app.pl
A  1 HB2 MET  1 
****>A  2 CA  MET  1< doesn't match >1  A  22<
A  3 HA  MET  1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM