简体   繁体   中英

merge specified lines from multiple files using perl script

file_1.txt

$thread1 = new threads \&callfunc1,"1";
$thread2 = new threads \&callfunc1,"2";
$thread3 = new threads \&callfunc1,"3";
$thread4 = new threads \&callfunc1,"4";
$thread5 = new threads \&callfunc1,"5";
$thread6 = new threads \&callfunc1,"6";
$thread7 = new threads \&callfunc1,"7";
$thread8 = new threads \&callfunc1,"8";
$thread9 = new threads \&callfunc1,"9";
$thread10 = new threads \&callfunc1,"10";
$thread11 = new threads \&callfunc1,"11";
$thread12 = new threads \&callfunc1,"12";

file_2.txt

$thread13 = new threads \&callfunc2,"1";
$thread14 = new threads \&callfunc2,"2";
$thread15 = new threads \&callfunc2,"3";
$thread16 = new threads \&callfunc2,"4";
$thread17 = new threads \&callfunc2,"5";
$thread18 = new threads \&callfunc2,"6";

file_3.txt

$thread19 = new threads \&callfunc3,"1";
$thread20 = new threads \&callfunc3,"2";
$thread21 = new threads \&callfunc3,"3";

file_4.txt

$thread22 = new threads \&callfunc4,"1";
$thread23 = new threads \&callfunc4,"2";
$thread24 = new threads \&callfunc4,"3";

I have four files. I need to merge these files and make a single file. The new file should contain every odd lines from file_1.txt, even lines from file_2.txt, 4th line from file_3.txt & 8th line from file_4.txt .

merge.txt

$thread1 = new threads \&callfunc1,"1";
$thread13 = new threads \&callfunc2,"1";
$thread2 = new threads \&callfunc1,"2";
$thread19 = new threads \&callfunc3,"1";
$thread3 = new threads \&callfunc1,"3";
$thread14 = new threads \&callfunc2,"2";
$thread4 = new threads \&callfunc1,"4";
$thread22 = new threads \&callfunc4,"1";
$thread5 = new threads \&callfunc1,"5";
$thread15 = new threads \&callfunc2,"3";
$thread6 = new threads \&callfunc1,"6";
$thread20 = new threads \&callfunc3,"2";
$thread7 = new threads \&callfunc1,"7";
$thread16 = new threads \&callfunc2,"4";
$thread8 = new threads \&callfunc1,"8";
$thread23 = new threads \&callfunc4,"2";
$thread9 = new threads \&callfunc1,"9";
$thread17 = new threads \&callfunc2,"5";
$thread10 = new threads \&callfunc1,"10";
$thread21 = new threads \&callfunc3,"3";
$thread11 = new threads \&callfunc1,"11";
$thread18 = new threads \&callfunc2,"6";
$thread12 = new threads \&callfunc1,"12";
$thread24 = new threads \&callfunc4,"3";

I have tried below code to achieve this, but it is merging one line from each file. Can any body help me on this. Thanks in advance.

#merger
unlink "threadperl.txt";
my @files = ('file_1.txt','file_2.txt','file_3.txt','file_4.txt');
my @fh;

#create an array of open filehandles.
@fh = map { open my $f, $_ or die "Cant open $_:$!"; $f } @files;


open my $out_file, ">threadperl.txt" or die "can't open out_file: $!";

my $output;
do
{
    $output = '';
    foreach (@fh){

        my $line = <$_>;
        if (defined $line){
            #Special case: might not be a newline at the end of the file
            #add a newline if none is found.
            $line .= "\n" if ($line !~ /\n$/);
            $output .= $line;
        }
    }

    print {$out_file} $output;
}
while ($output ne '');

You didn't specify how you wanted to merge the files, I'm assuming assembled consecutively.

First, read files into arrays

    open my $handle, '<', "file_1.txt";
    chomp(my @file1 = <$handle>);
    close $handle;

Then, remap the array by using a "map" expression on the index for each element of the array ( map is like an inline for each):

    my @odd_indexed_elements = @file1[map { $_ * 2 + 1 } 1 .. int($#array / 2) - 1];
    my @even_indexed_elements = @file2[map { $_ * 2 } 1 .. int($#array / 2)];

Then you can push out both arrays together:

    print output push( @file1, @file2 );

Just for fun I wanted to see what it might look like if we pulled the filtering logic out of the read loop. Just another approach... also this doesn't slurp each of the files into memory, so it would run on potentially much longer data files and is pretty easy to extend the input files & filtering logic.

The filtering logic is terse, take a look at a longer-form example in the comments after the file definitions.

#!/usr/bin/perl

use strict;

my $debug = 0;

my @inFiles = (
   { fileName=>"file_1.txt", label=>"even", filter=>sub { ( shift->{lineCnt} % 2 ) == 0 } },
   { fileName=>"file_2.txt", label=>"odd",  filter=>sub { ( shift->{lineCnt} % 2 ) != 0 } },
   { fileName=>"file_3.txt", label=>"4th",  filter=>sub { ( shift->{lineCnt} % 4 ) == 0 } },
   { fileName=>"file_4.txt", label=>"8th",  filter=>sub { ( shift->{lineCnt} % 8 ) == 0 } }
   # Ok to add additional files here if desired, ok to use other filtering "logic".
   # For example, we could teach capture() to add the current line to a given $inFile,
   # then you could write "filters" subroutines that did pattern matching as well.
   # { fileName=>"file_4.txt",  # Path to input file
   #   label=>"8th",            # more or less a comment to describe the filter's goal.
   #   filter=>sub {            # read logic calls this to see if we should keep a line.
   #      # This is a more verbose version of hwo the filter logic works.
   #      # I want to point out you can get fairly complex, and include debug prints
   #      # in here.  Also just leaving it at "shift->{..." is a bit opaque.
   #      my $hash = shift;
   #      my $curLineNumber = $hash->{lineCnt};
   #      my $result = ( $curLineNumber % 8 ) == 0;
   #      print "$hash->{fileName}.$curLineNumber: label=$label, result=$result\n";
   #      return $result;
   #   }
   #  }
);

# Initialize our files.
# Since we are keeping everything we know about an input file
# in a HASH, we'll add some new keys here to make life easier.
foreach my $inFile  ( @inFiles ) {
   # $inFile is a hash ref for each of the file1 file2 etc.
   my $name = $inFile->{fileName}; # just a shortcut, we'll use name a lot so easier to read.
   -e $name || die "input file $name does not exist.";
   -f $name || die "input file $name is not a regular file.";
   # our first new key will be the file handle - we'll use this later for reading.
   open $inFile->{handle}, "<", "$name" || die "open $name for reading: $!";
   $inFile->{lineCnt} = 0; # another new key, count how many lines we have read from this file.
   $inFile->{filterCnt} = 0; # also count how many times our filter answers true.
   print "opened input file $inFile->{fileName}, label=$inFile->{label}\n" if $debug;
}

my $readCnt; # track how much (if anything) we read.
do {
   $readCnt = 0; # assume we read nothing this time.
   foreach my $inFile  ( @inFiles ) {
      $readCnt += capture( $inFile ); # may have read something...
   }
} while( $readCnt >= 1 ); # so long as we read soemthing try again.

print "Data reading completed, closing input files...\n";
my $totalHits = 0;
foreach my $inFile  ( @inFiles ) {
   close($inFile->{handle}) || warn "Ignoring error closing input file $inFile->{fileName}: $!";
   $totalHits += $inFile->{filterCnt};
   printf "\tfile: %12s  <%6s> #lines: %4d #hits: %4d\n"
      , $inFile->{fileName},
      , $inFile->{lineCnt},
      , $inFile->{label},
      , $inFile->{filterCnt},
}
print "Done.  Total hits=$totalHits\n";


sub capture {
   my $inFile = shift;
   my $line;
   my $readCnt = 0;
   my $handle = $inFile->{handle};
   if( $line = <$handle> ) {
      ++$inFile->{lineCnt};
      ++$readCnt;  # lets our caller know not out of data.
      my $filter = $inFile->{filter}; # get our filtering subroutine
      my $filterResult = &$filter( $inFile ); # invoke the subroutine
      printf "%s.%03d: <%5s> filterResult=%s\n", $inFile->{fileName},$inFile->{lineCnt}, $inFile->{label}, $filterResult if $debug;
      if( $filterResult  ) {
         ++$inFile->{filterCnt}; # count how many times the filter hits.
         print "$inFile->{fileName}.$inFile->{lineCnt}: $line";
         # you could write this to wherever you want it.
      }
   } else {
      # no more data for this input file, nothing to do.
   }
   return $readCnt; # will be 0 or 1
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM