简体   繁体   English

AMPHP - 排队的任务比池中可用的工人多

[英]AMPHP - Queueing more Tasks than available Workers in Pool

I have a project in which I am converting a large amount of .tif images into PDF documents.我有一个项目,我将大量 .tif 图像转换为 PDF 文档。 File count goes into millions.文件数达到数百万。

To speed up the process I am using Amphp.为了加快进程,我正在使用 Amphp。 Since the process of converting the images with Imagemagick takes up some cpu power I want to limit the maximum amount of parallel running converter processes.由于使用 Imagemagick 转换图像的过程会占用一些 CPU 功率,因此我想限制并行运行的转换器进程的最大数量。

My first approach works, but could be improved if I queue the files instead of giving a set amount of workers an array of x files.我的第一种方法有效,但如果我将文件排队而不是给一定数量的工作人员一个 x 文件数组,则可以改进。

This is my current code, where I tried to replicate the example .这是我当前的代码,我试图在其中复制示例

<?php
require dirname(__DIR__) . '/vendor/autoload.php';

$constants = get_defined_constants(true);
$constants = $constants['user'];
$maxFileCount = THREAD_CHUNKSIZE * THREAD_COUNT;
$i = 0;
$folder = opendir(LOOKUP_PATH);
$tasks = [];

while ($i < $maxFileCount && (false !== ($import_file = readdir($folder)))) {
    $fileParts = explode('.', $import_file);
    $ext = strtolower(end($fileParts));
    if($ext === 'xml') {
        $filePath = LOOKUP_PATH. 'xml'.DIRECTORY_SEPARATOR.$import_file;
        $tasks[] = new ConvertPdfTask([$filePath], $constants);
    }
    $i++;
}
if(!empty($tasks)) {
    Amp\Loop::run(function () use ($tasks) {
        $coroutines = [];
        $pool = new Amp\Parallel\Worker\DefaultPool(THREAD_COUNT);
        foreach ($tasks as $index => $task) {
            $coroutines[] = Amp\call(function() use ($pool, $task) {
                return yield $pool->enqueue($task);
            });
        }
        $results = yield Amp\Promise\all($coroutines);

        return yield $pool->shutdown();
    });
}

My problem is, that as soon as I enqueue more than the THREAD_COUNT amount of tasks, I get the following PHP warning: Warning: Worker in pool exited unexpectedly with code -1 and no PDFs are created.我的问题是,一旦我排队的任务数量超过THREAD_COUNT数量,我就会收到以下 PHP 警告: Warning: Worker in pool exited unexpectedly with code -1并且没有创建 PDF。

As long as I stay below the maximum pool size, everything is fine.只要我保持在最大池大小以下,一切都很好。

I am using PHP 7.4.9 on Windows 10 and amphp/parallel 1.4.0.我在 Windows 10 和 amphp/parallel 1.4.0 上使用 PHP 7.4.9。

After some more experimenting I found a solution, that seems to work.经过一些更多的实验,我找到了一个解决方案,这似乎有效。 It feels a bit "hacky", so if anyone has a better idea, please share.感觉有点“hacky”,所以如果有人有更好的想法,请分享。 I thought the pool would automatically build up a queue which is then handled by the maximum amount of workers, that seems to not be the case.我认为池会自动建立一个队列,然后由最大数量的工作人员处理,但事实并非如此。

I now save the coroutines that I get from the Amp\\call in two separate arrays.我现在将从Amp\\call获得的协程保存在两个单独的数组中。 One which holds all coroutines and one that holds all for the current loop.一个保存所有协程,一个保存当前循环的所有协程。

$coroutine = Amp\call(function () use ($pool, $task) {
    return yield $pool->enqueue($task);
});
$loopRoutines[] = $coroutine;
$allCoroutines[] = $coroutine;

After enqueueing an item I check if I already reached the maximum number of configured threads.将项目入队后,我检查是否已达到最大配置线程数。 If the pool has the maximum numbers of workers and no idle worker, I call the Amp\\Promise\\first function on my current-loop coroutines to wait for a new free idle worker.如果池中有最大数量的工作线程并且没有空闲工作线程,我会在当前循环协程上调用Amp\\Promise\\first函数以等待新的空闲空闲工作线程。

Since the function would instantly return the next time I get there (because the finished coroutine is still im my current-loop array), I clear the array.由于函数会在我下次到达时立即返回(因为完成的协程仍然是我的当前循环数组),我清除了数组。

if ($pool->getWorkerCount() >= (THREAD_COUNT) && $pool->getIdleWorkerCount() === 0) {
    yield Amp\Promise\first($loopRoutines);
    $loopRoutines = [];
}

After the foreach I call Amp\\Promise\\all on my all-coroutines array, so the script waits until all workers are finished.在 foreach 之后,我在我的 all-coroutines 数组上调用Amp\\Promise\\all ,所以脚本会等到所有工作程序都完成。

Here is my changed code:这是我更改的代码:

<?php
require dirname(__DIR__) . '/vendor/autoload.php';

$constants = get_defined_constants(true);
$constants = $constants['user'];
$maxFileCount = THREAD_CHUNKSIZE * THREAD_COUNT;
$i = 0;
$folder = opendir(LOOKUP_PATH);
$tasks = [];

while ($i < $maxFileCount && (false !== ($import_file = readdir($folder)))) {
    $fileParts = explode('.', $import_file);
    $ext = strtolower(end($fileParts));
    if($ext === 'xml') {
        $filePath = LOOKUP_PATH. 'xml'.DIRECTORY_SEPARATOR.$import_file;
        $tasks[] = new ConvertPdfTask([$filePath], $constants);
    }
    $i++;
}
if(!empty($tasks)) {
    Amp\Loop::run(function () use ($tasks) {
        $allCoroutines = [];
        $loopRoutines = [];
        $pool = new Amp\Parallel\Worker\DefaultPool(THREAD_COUNT);
        foreach ($tasks as $index => $task) {
            $coroutine = Amp\call(function () use ($pool, $task) {
                return yield $pool->enqueue($task);
            });
            $loopRoutines[] = $coroutine;
            $allCoroutines[] = $coroutine;
            if ($pool->getWorkerCount() >= THREAD_COUNT && $pool->getIdleWorkerCount() === 0) {
                yield Amp\Promise\first($loopRoutines);
                $loopRoutines = [];
            }
        }
        yield Amp\Promise\all($allCoroutines);

        return yield $pool->shutdown();
    });
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM