简体   繁体   中英

find partial matching two files in perl

I want to write a Perl program. The first input file is 2 columns of text. The first column is a label and the second column is the search string. The second input file also has 2 columns. The first column is a label and the second column is the text to be searched. For example, according to the second columns, John (in the file1) is more similar to Johni in file2 than John.

file1

John AABBBCCCDEE
Jam  WWQQQQQQQERRRTTTTTT

file2

Jami    EWWQQQQQQQERRRTTTTTTTTTT
Johni   AAAAABBBCCCDEEEEEEHHHHHH
Mark    WWWCCVVVVVVFFFFFFFTTTTTT
ROB     ##@@@########VVVVVVVVVVV
John    WWADFRWSSSSSSDDDDDqqqqqq

output

Jami    EWWQQQQQQQERRRTTTTTTTTTT    Jam  WWQQQQQQQERRRTTTTTT
Johni   AAAAABBBCCCDEEEEEEHHHHHH    John AABBBCCCDEE

I tried the following code but it doesn't work the way I want.

#!/user/bin/perl
use warnings;
use strict;

my ($infile1) = $ARGV[0];
my ($infile2) = $ARGV[1];
open(my $fh1, "<$infile1");

while(my $file1 = <$fh1> ){

my @file1 = split ("\t| ", $file1);
my $name_file1 = $file1[0];
my $ID_file1 = $file1[1];
my @matchline_file2 = `cat $infile2 | grep $name_file1`;
for my $ID_file1 (@file1){
        if (grep my $ID_file2 eq $ID_file1, @matchline_file2){
        print "found\n";}else{print "not_found\n";}}}

This doesn't print the results in reverse order like your output. I'm not sure if that was intentional. You could store the results in an array and reverse or sort the order if you like. Your example is very limited and this is just a best estimate of what you're trying to do.

#!/usr/bin/perl
use warnings;
use strict;

my ($infile1) = $ARGV[0];
my ($infile2) = $ARGV[1];

my $search_file = "";
open(my $fh2, "<$infile2");

while(my $line = <$fh2>)
{
   $search_file .= $line;
}

open(my $fh1, "<$infile1");

while(my $line = <$fh1>)
{
   chomp($line);

   if($line =~ m/\w+\s+(.*)/)
   {
       my $search_string = quotemeta("$1");

       if($search_file =~ m/(.*$search_string.*)/)
       {
          print "$1\t$line\n";
       }
       else
       {
          print "Could not find: $line\n";
       }
   }
   else
   {
      print "Invalid line: $line\n";
   }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM