How to improve perl script performance?

Question

I am running the ucm2.pl script to scan a huge directory structure (directory is a network drive mapped to local). I have two perl scripts ucm1.pl and ucm2.pl. I am running ucm2.pl parellely for different arguments and it is called through ucm1.pl.

ucm1.pl -

    #!/usr/bin/perl
    use strict; 
    use warnings;
    use Parallel::ForkManager;

    my $filename ="intfSplitList.txt"; #(this will have list of all the input files. eg intfSplit_0....intfSplit_50)
     my $lines;
     my $buffer;
        open(FILE, $filename) or die "Can't open `$filename': $!";
        while (<FILE>) {
            $lines = $.;
        }
        close FILE;
    print "The number of lines in $filename is $lines \n";


    my $pm = Parallel::ForkManager->new($lines); #(it will set the no. of parallel processes)

    open (my $fh, '<', "intfSplitList.txt") or die $!;
    while (my $data = <$fh>) {
      chomp $data;

      my $pid = $pm->start and next;

     system ("perl ucm2.pl -iinput.txt -f$data");  
#(call the ucm2.pl) #(input.txt file will have search keyword and $data will have intfSplit_*.txt files)

      $pm->finish; # Terminates the child process
    }

ucm2.pl code-

#!/usr/bin/perl
use strict;
use warnings;  
use File::Find;
use Getopt::Std;
#getting the input parameters
getopts('i:f:');

our($opt_i, $opt_f);
my $searchKeyword     = $opt_i;                               #Search keyword file.
my $intfSplit         = $opt_f;                               #split file
my $path              = "Z:/aims/";                           #source directory
my $searchString;                                             #search keyword

open FH, ">log.txt";                                          #open the log file to write

print FH "$intfSplit ". "started at ".(localtime)."\n";       #write the log file

open (FILE,$intfSplit);                                       #open the split file to read

while(<FILE>){

   my $intf= $_;                                             #setting the interface to intf
   chomp($intf);
   my $dir = $path.$intf;
   chomp($dir);
   print "$dir \n";                                              
   open(INP,$searchKeyword);                         #open the search keyword file to read

   while (<INP>){      

   $searchString =$_;                           #setting the search keyword to searchString
   chomp($searchString);
   print "$searchString \n";
   open my $out, ">", "vob$intfSplit.txt" or die $!; #open the vobintfSplit_* file to write

#calling subroutine printFile to find and print the path of element
find(\&printFile,$dir);                                       

#the subroutine will search for the keyword and print the path if keyword is exist in file.
sub printFile {
   my $element = $_;

   if(-f $element && $element =~ /\.*$/){ 

      open my $in, "<", $element or die $!;
      while(<$in>) {
         if (/\Q$searchString\E/) {
            my $last_update_time = (stat($element))[9];
            my $timestamp  = localtime($last_update_time);
            print $out "$File::Find::name". "     $timestamp". "     $searchString\n";
            last;
          }
        }
      }
    }
  }
}
print FH "$intfSplit ". "ended at ".(localtime)."\n";         #write the log file

everything is running fine but its running for very long time for single keyword search also. can anyone please suggest some better way to improve the performance.

Thanks in advance!!

Answer 1

Running multiple instances of Perl adds a lot of unnecessary overhead. Have you looked at my answer to your previous question , which suggested changing this?

Also as I mentioned previously, you have some unnecessary repetition here: there is no reason to open and process your search keyword file multiple times. You can make one sub that opens the keyword file and puts the keywords in an array. Then pass these keywords to another sub that does the searching.

You can make a search for multiple keywords much faster by searching for them all at once. Do something like this to get your keywords:

my @keywords = map {chomp;$_} <$fh>;
my $regex = "(" . join('|', map {quotemeta} @keywords) . ")";

Now you have a single regex like this: (\\Qkeyword1\\E|\\Qkeyword2\\E) . You only have to search the files once, and if you want to see which keyword matched, just check the content of $1 . This won't speed things up for a single keyword, but searching for many keywords will be nearly as fast as searching for a single one.

Ultimately, though, if you are searching a huge directory structure on the network, there may be a limit to how much you can speed things up.

Update: corrected the chomping. Thanks amon.

How to improve perl script performance?

Question

1 answers

solution1
1 ACCPTED

How to improve perl script performance?

Question

1 answers

solution1 1 ACCPTED

solution1
1 ACCPTED