簡體   English   中英

Perl-並行編程-運行兩個外部程序

[英]Perl - parallel programming - running two external programs

我有一個Perl腳本,用於運行一系列數據集的兩個外部程序,一個依賴另一個程序。 當前,我只是一次對每個數據集執行一次,通過第一個程序運行它,使用qx收集結果,然后使用這些結果運行第二個程序。 數據將與第二個程序的結果添加到輸出文件中,每個數據集一個文件。 我創建了一個簡單的可重現的示例,希望可以捕捉到我當前的方法:

#!/usr/bin/perl
#
# stackoverflow_q_7-7-2016.pl

use warnings;
use strict;

my @queries_list = (2, 4, 3, 1);

foreach my $query (@queries_list) {
    #Command meant to simulate the first, shorter process, and return a list of results for the next process
    my $cmd_1 = "sleep " . $query . "s; shuf -i 4-8 -n 3";
    print "Running program_1 on query $query...\n";
    my @results = qx($cmd_1);

    foreach (@results) {
        chomp $_;
        #Command meant to simulate a longer process whose input depends on program_1; the output I write to a separate file for each query
        my $cmd_2 = "sleep " . $_ . "s; fortune -s | head -c " . $_ * 5 . " >> $query.output";
        print "\tRunning program_2 on query $query with input param $_...\n";
        system($cmd_2);         }
}

由於第一個程序通常比第二個程序完成得快,因此我認為有可能通過繼續在program_2上運行新的查詢(同時program_2也正在上一個查詢上運行)來加快整個處理的速度。 加快速度太好了,因為目前需要很多小時才能完成處理。 但是,我不確定該怎么做。 像Parallel :: ForkManager這樣的解決方案會解決嗎? 或在Perl中使用線程?

現在,在我的實際代碼中,我進行了一些錯誤處理,並為program_2設置了超時-我使用fork,exec和$ SIG {ALRM}來執行此操作,但我真的不知道自己在做什么。 我仍然有能力執行此操作很重要,否則program_2可能會卡住或無法充分報告失敗原因。 這是帶有錯誤處理的代碼。 在可重現的示例中,我認為它沒有達到應有的效果,但是至少您希望可以看到我正在嘗試做的事情。 錯誤處理如下:

#!/usr/bin/perl
#
# stackoverflow_q_7-7-2016.pl

use warnings;
use strict;

my @queries_list = (2, 4, 3, 1);

foreach my $query (@queries_list) {
    #Command meant to simulate the first, shorter process, and return a list of results for the next process
    my $cmd_1 = "sleep " . $query . "s; shuf -i 4-15 -n 3";
    print "Running program_1 on query $query...\n";
    my @results = qx($cmd_1);

    foreach (@results) {
        chomp $_;
        #Command meant to simulate a longer process whose input depends on program_1; the output I write to a separate file for each query
        my $cmd_2 = "sleep " . $_ . "s; fortune -s | head -c " . $_ * 3 . " >> $query.output";
        print "\tRunning program_2 on query $query with input param $_...\n";

        my $childPid;
        eval {
            local $SIG{ALRM} = sub { die "Timed out" };
            alarm 10;
            if ($childPid = fork()) {
                wait();
            } else {
                exec($cmd_2);
            }
            alarm 0;
        };
        if ($? != 0) {
            my $exitCode = $? >> 8;
            print "Program_2 exited with error code $exitCode. Retry...\n";
        }
        if ($@ =~ /Timed out/) {
            print "\tProgram_2 timed out. Skipping...\n";
            kill 2, $childPid;
            wait;
        };
    }
}

感謝所有幫助。

一種解決方案:

use threads;

use Thread::Queue;  # 3.01+

sub job1 { ... }
sub job2 { ... }

{
   my $job1_request_queue = Thread::Queue->new();
   my $job2_request_queue = Thread::Queue->new();

   my $job1_thread = async {
      while (my $job = $job1_request_queue->dequeue()) {
         my $result = job1($job);
         $job2_request_queue->enqueue($result);
      }

      $job2_request_queue->end();
   };

  my $job2_thread = async {
      while (my $job = $job2_request_queue->dequeue()) {
         job2($job);
      }
   };

   $job1_request_queue->enqueue($_) for ...;

   $job1_request_queue->end();    
   $_->join() for $job1_thread, $job2_thread;
}

您甚至可以同時擁有兩種類型的多個工作器。

use threads;

use Thread::Queue;  # 3.01+

use constant NUM_JOB1_WORKERS => 1;
use constant NUM_JOB2_WORKERS => 3;

sub job1 { ... }
sub job2 { ... }

{
   my $job1_request_queue = Thread::Queue->new();
   my $job2_request_queue = Thread::Queue->new();

   my @job1_threads;
   for (1..NUM_JOB1_WORKERS) {
      push @job1_threads, async {
         while (my $job = $job1_request_queue->dequeue()) {
            my $result = job1($job);
            $job2_request_queue->enqueue($result);
         }
      };
   }

   my @job2_threads;
   for (1..NUM_JOB2_WORKERS) {
      push @job2_threads, async {
         while (my $job = $job2_request_queue->dequeue()) {
            job2($job);
         }
      };
   }

   $job1_request_queue->enqueue($_) for ...;

   $job1_request_queue->end();    
   $_->join() for @job1_threads;
   $job2_request_queue->end();
   $_->join() for @job2_threads;
}

使用IPC :: Run而不是qx添加超時。 無需信號。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM