简体   繁体   中英

High performance method for downloading large images to PHP server from URL

I have an arrray with about 100 links to images. The images are about 5-10 MB. I want to loop through the array and download all the pictures to my server. I have found several ways of doing it and started with file_get_contents but it eats my memory.

I have also looked at

Wget

shell_exec('wget -O /var/www/html/images/image.gif http://www.google.com/images/logo_sm.gif');

PHP Copy

copy('http://example.com/image.php', 'local/folder/flower.jpg');

CULR

$url  = 'http://www.google.com/images/logo_sm.gif';
$path = '/var/www/html/images/images.gif';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$data = curl_exec($ch);

curl_close($ch);

file_put_contents($path, $data);

Each user has their "own" array (different image links). What is the fastest way to download the pictures to my server and require minimum performance (low memory usage, etc.)

Time the cURL/wget/copy approaches and see: they'll have "roughly the same" throughput and they won't use excessive memory 1 . However,

these new approaches suffer from the same problem ; the downloads occur in series.

(While there are a number of factors involved, including bandwidth/latency, and distribution between servers/handlers, adding some degree of parallelism will be the single biggest method of improving the overall throughput .)

wget + parallel spawns

There are several methods to start generic parallel processes which can be used with shell_exec and wget. These approaches can spawn the (wget) processes in parallel.

Because of the loss of direct access to what a particular process is doing this also involves finagling output redirection/processing. On the other hand, it is a 'relatively simple' change from a one-shot shell exec.

The shell exec itself should also be hardened against injection attacks; security should not be discounted when using such shell access.

cURL + multi exec

The more lucrative (and initially complicated) approach is to use curl_multi_exec . Unlike curl_exec , multi exec allows curl to be processed in an async manner and thus supports parallel operations.

The process is a bit convoluted, but a generic example can be found here ; and there are some related questions on SO (although I have yet to find a 'killer question/answer' for this specific problem):

An implementation would probably also want to limit the number of parallel cURL requests.

I recommend cURL because it avoids having to be 'extra careful' when dealing with the shell. If the shell exec must be used, then consider saving the file list/targets to a file first and then feeding that in via xargs or whatnot. Using cURL also allows associated feedback for individual requests.

Since this operation is likely bound to 'take some time' the downloading should probably be done via a queue / off-request mechanism .. but that's a different architectural can of worms.


1 The problem with file_get_contents is that it will fetch the downloaded data as a string which can cause 'out of memory' conditions, depending file size and PHP environment.

However, none of the cURL/wget/open approaches (when done correctly) have this problem as they stream directly to a file. (In the question the cURL can run out of memory because it does not correctly stream to a file but rather invokes file_put_contents after the entire file is already downloaded into memory.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM