简体   繁体   中英

Guzzle: Parallel file download using Guzzle's Pool:batch() and `sink` option

You can execute http requests in parallel using Guzzle's Pool:batch() method. It allows you to set default options for requests using options key in the third parameter.

But what if I need different options for different requests in the pool? I would like to execute GET requests using a pool and stream each response to a different file on disk. There is a sink option for that. But how to apply different values of this option to requests?

Rastor's example is almost right, but it's incorrectly implemented if you want to provide "options" to the Pool() constructor.

He's missing the critical implementation of the Pool options array mentioned here .

The Guzzle docs say:

When a function is yielded by the iterator, the function is provided the "request_options" array that should be merged on top of any existing options, and the function MUST then return a wait-able promise.

Also, if you look at the Pool() code below the comment I linked to, you can see that Guzzle's Pool calls the callable and gives it the Pool's "options" as the argument, precisely so that you are supposed to apply it to your request.

The correct precedence is

Per-request options > Pool options > Client defaults.

If you don't apply the Pool() object's options array to your request objects, you will end up with severe bugs such as if you try making a new Pool($client, $requests(100), ['options'=>['timeout'=>30.0]]); . Without my corrected code, your Pool-options won't be applied at all, since you didn't support merging the pool options properly and therefore simply ended up discarding them.

So here is the correct code with support for Pool() options:

<?php

$client = new \GuzzleHttp\Client();

$requests = function ($total) use ($client) {
    for ($i = 0; $i < $total; $i++) {
        $url = "domain.com/picture/{$i}.jpg";
        $filepath = "/tmp/{$i}.jpg";

        yield function($poolOpts) use ($client, $url, $filepath) {
            /** Apply options as follows:
             * Client() defaults are given the lowest priority
             * (they're used for any values you don't specify on
             * the request or the pool). The Pool() "options"
             * override the Client defaults. And the per-request
             * options ($reqOpts) override everything (both the
             * Pool and the Client defaults).
             * In short: Per-Request > Pool Defaults > Client Defaults.
             */
            $reqOpts = [
                'sink' => $filepath
            ];
            if (is_array($poolOpts) && count($poolOpts) > 0) {
                $reqOpts = array_merge($poolOpts, $reqOpts); // req > pool
            }

            return $client->getAsync($url, $reqOpts);
        };
    }
};

$pool = new Pool($client, $requests(100));

Note however that you don't have to support the Pool() options, if you know that you will never be adding any options to your new Pool() constructor. In that case, you can just look at the official Guzzle docs for an example.

The official example looks as follows:

// Using a closure that will return a promise once the pool calls the closure.
$client = new Client();

$requests = function ($total) use ($client) {
    $uri = '127.0.0.1:8126/guzzle-server/perf';
    for ($i = 0; $i < $total; $i++) {
        yield function() use ($client, $uri) {
            return $client->getAsync($uri);
        };
    }
};

$pool = new Pool($client, $requests(100));

You can specify the $options you want on the requests individually. It would only apply to all requests if you pass it to the client. Here's the excerpt from Guzzle 6 doc:

Headers may be added as default options when creating a client. When headers are used as default options, they are only applied if the request being created does not already contain the specific header. This include both requests passed to the client in the send() and sendAsync() methods and requests created by the client (eg, request() and requestAsync()).

See http://guzzle.readthedocs.org/en/latest/request-options.html?highlight=default#headers

For guzzle 6

$client = new \GuzzleHttp\Client();

$requests = function ($total) use ($client) {
    for ($i = 0; $i < $total; $i++) {
        $url = "http://domain.com/picture/{$i}.jpg";
        $filepath = "/tmp/{$i}.jpg";

        yield function() use ($client, $url, $filepath) {
            return $client->getAsync($url, [
                'sink' => $filepath
            ]);
        };
    }
};

$pool = new Pool($client, $requests(100));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM