简体   繁体   中英

Parallel::ForkManager makes subroutine 1000x slower

I have a subroutine, which I've serially optimized as much as I can, approximately like

sub overlap {

    my $hash_reference = shift;   # pass the hash to the subroutine
    my %h = %{ $hash_reference }; # refer to the hash as %h
    my $standard = shift;         # this is the key that will be compared against
    my $compared = shift;         # this is the key being compared
    my $start_index = 0;          # this will continually be increased
                                  # to save computation time

    # I want to parallelize here

    foreach my $s ( 0 .. scalar @{ $h{$standard}{end} }-1 ) {
        foreach my $c ( $start_index .. scalar @{ $h{$compared}{end} }-1 ) {
            ... # abbreviated for minimal working example
        }
    }

    return ($standard_indices_met_in_compared, \@overlay);
}

This is a slow subroutine. I run it thousands of times in about 12-14 minutes, but running this again and again wastes time.

I regularly use Parallel::ForkManager for system processes, but this doesn't work well here.

Implementation of Parallel::ForkManager looks like

use Parallel::ForkManager qw();
my $manager = new Parallel::ForkManager(2);
foreach my $s ( 0 .. scalar @{ $h{$standard}{end} }-1 ) {

    foreach my $c ( $start_index .. scalar @{ $h{$compared}{end} }-1 ) {
        $manager->start and next;
        ... # abbreviated for minimal working example
    }

    $manager->finish;
}

$manager->wait_all_children;      # necessary after all lists

I've looked at threads and such, but do not see how to apply here.

I have looked at Perl multithreading and foreach and the Perl documentation for threads, and numerous other sources, but I don't see how I can apply what has been done before in this case. Everything I see looks like it is for system commands only.

I want to write to a shared array and scalar, with no system commands. In case I'm missing something, please tell me.

How can I parallelize this foreach loop inside a subroutine?

Are you really trying to parallelize with a maximum of two processes only? If so, this may be the source of the perceived slowness.

There will always be an overhead associated with parallelization. You cannot guarantee a 10x speed-up if you parallelize over 10 processes.

I suggest you open up the maximum number of processes to something more reasonable and try again. If this does not help, it may be due to:

  • hardware limitations
  • something about the loop you are trying to parallelize that is forcing sequential execution (eg writing to the same file, DB table, updating a semaphored or shared variable...)

Once we got to see the Parallel::ForkManager part, I'd like to address a direct error in what is shown, already noted in a comment by ysth .

With loops only indicated for clarity, and with a bit more meaningful limit, you have

use Parallel::ForkManager;
my $manager = Parallel::ForkManager->new(8);

foreach my $s ( ... )
{    
    foreach my $c ( ... ) 
    {
        $manager->start and next;    # 
        # code                       # WRONG
    }                                # Module: Can't fork inside child
    $manager->finish;                #
}
$manager->wait_all_children;

Let's see what this attempts to do.

A child is forked inside of the inner loop. But it exits outside of it , meaning that it runs the whole loop. So each child would also execute the line that creates new children, along with the parent. That would be a real mess, with a cascade of children and with a wrong partition of work between them.

But the module just doesn't allow this, throwing an error. Is your real code different than shown?

Now consider

foreach my $s ( ... ) 
{    
    $manager->start and next;     # child forked

    foreach my $c ( ... ) 
    {                             # Whole inner loop
        # code                    # run by one child
    }                             # for one value of $s

    $manager->finish;             # child exits
}    

A fork happens outside of the inner loop and the child proceeds to run the whole loop, with the current value of $s . The parent skips to the next iteration of the outer loop and forks another child, which runs the inner loop for that, next, value of $s . Each child runs the whole inner loop for subsequent values of $s . So the iterations of the outer loop are executed in parallel.

This is what you want. So change your code to do this and see how it goes.

To repeat what has been said, not all code benefits equally from being run in parallel. Some code cannot at all run correctly in parallel, and some may suffer a noticable performance drop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM