简体   繁体   English

puppeteer-cluster:排队而不是执行

[英]puppeteer-cluster: queue instead of execute

I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly.我正在试验 Puppeteer Cluster,但我不明白如何正确使用排队。 Can it only be used for calls where you don't wait for a response?它只能用于不等待响应的呼叫吗? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail while only some fail when I have the command execute directly.我正在使用 Artillery 同时发出一堆请求,但是当我直接执行命令时,它们都失败了,而只有一些失败。

I've taken the code straight from the examples and replaced execute with queue which I expected to work, except the code doesn't wait for the result.我直接从示例中获取了代码,并将execute替换为我希望工作的queue ,除了代码不等待结果。 Is there a way to achieve this anyway?有没有办法实现这一目标?

So this works:所以这有效:

const screen = await cluster.execute(req.query.url);

But this breaks:但这打破了:

const screen = await cluster.queue(req.query.url);

Here's the full example with queue :这是queue的完整示例:

const express = require('express');
const app = express();
const { Cluster } = require('puppeteer-cluster');

(async () => {
    const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_CONTEXT,
        maxConcurrency: 2,
    });
    await cluster.task(async ({ page, data: url }) => {
        // make a screenshot
        await page.goto('http://' + url);
        const screen = await page.screenshot();
        return screen;
    });

    // setup server
    app.get('/', async function (req, res) {
        if (!req.query.url) {
            return res.end('Please specify url like this: ?url=example.com');
        }
        try {
            const screen = await cluster.queue(req.query.url);

            // respond with image
            res.writeHead(200, {
                'Content-Type': 'image/jpg',
                'Content-Length': screen.length //variable is undefined here
            });
            res.end(screen);
        } catch (err) {
            // catch error
            res.end('Error: ' + err.message);
        }
    });

    app.listen(3000, function () {
        console.log('Screenshot server listening on port 3000.');
    });
})();

What am I doing wrong here?我在这里做错了什么? I'd really like to use queuing because without it every incoming request appears to slow down all the other ones.我真的很想使用排队,因为没有它,每个传入请求似乎都会减慢所有其他请求。

Author of puppeteer-cluster here. puppeteer-cluster 的作者在这里。

Quote from the docs:来自文档的引用:

cluster.queue(..) : [...] Be aware that this function only returns a Promise for backward compatibility reasons. cluster.queue(..) : [...] 请注意,出于向后兼容性的原因,此函数仅返回 Promise。 This function does not run asynchronously and will immediately return.此函数不会异步运行,会立即返回。

cluster.execute(...) : [...] Works like Cluster.queue , just that this function returns a Promise which will be resolved after the task is executed. cluster.execute(...) : [...] 像Cluster.queue一样工作,只是这个函数返回一个 Promise,它会在任务执行后被解析。 In case an error happens during the execution, this function will reject the Promise with the thrown error.如果在执行过程中发生错误,该函数将拒绝抛出错误的 Promise。 There will be no "taskerror" event fired.不会触发“taskerror”事件。

When to use which function:何时使用哪个函数:

  • Use cluster.queue if you want to queue a large number of jobs (eg list of URLs).如果您想对大量作业(例如 URL 列表)进行排队,请使用cluster.queue The task function needs to take care of storing the results by printing them to console or storing them into a database.任务功能需要通过将结果打印到控制台或将它们存储到数据库来处理存储结果。
  • Use cluster.execute if your task function returns a result.如果您的任务函数返回结果,请使用cluster.execute This will still queue the job, so this is like calling queue in addition to waiting for the job to finish.这仍然会将作业排队,因此除了等待作业完成之外,这就像调用queue一样。 In this scenario, there is most often a "idling cluster" present which is used when a request hits the server (like in your example code).在这种情况下,最常见的是存在“空闲集群”,当请求到达服务器时使用该集群(如您的示例代码中所示)。

So, you definitely want to use cluster.execute as you want to wait for the results of the task function.所以,你肯定想使用cluster.execute因为你想等待任务函数的结果。 The reason, you do not see any errors is (as quoted above) that the errors of the cluster.queue function are emitted via a taskerror event.您没有看到任何错误的原因是(如上所述) cluster.queue函数的错误是通过taskerror事件发出的。 The cluster.execute errors are directly thrown (Promise is rejected).直接抛出cluster.execute错误(Promise 被拒绝)。 Most likely, in both cases your jobs fail, but it is only visible for the cluster.execute最有可能的是,在这两种情况下,您的作业都失败了,但它仅对cluster.execute可见

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM