简体   繁体   中英

Copying Images from URl list to my server all at once by php

I have big list of Urls in html file for images something like this :

<a href="http://example.com/image1.jpg">image1</a>
<a href="http://example.com/image2.jpg">image2</a>
<a href="http://example.com/image3.jpg">image3</a>
<a href="http://example.com/image4.jpg">image4</a>
<a href="http://example.com/image5.jpg">image5</a>
<a href="http://example.com/image6.jpg">image6</a>
<a href="http://example.com/image7.jpg">image7</a>

Around 50,000 Image

I want to make small script that can copy all images to my server so i can have them in :

http://Mywebsite.com/images/image1.jpg
http://Mywebsite.com/images/image1.jpg
http://Mywebsite.com/images/image1.jpg
...

I want to make loop and each Url in the list must be deleted after the image is copied successfully because sometimes if page crush on loading or something i can continue my loop without overwriting or reading again , if there is a better solution to not overwrite and read the url again please tell me.

I would create a script that reads your html file line per line.
You can do that using fopen and fgets .

fopen("path/to/some/file", "r");
while ( ( $line = fgets( $handle ) ) !== false ) 
{
    // do somehting with $line
}

This way the file gets not simply parsed into memory, so you don't have to worry about size

Then after parsing every line I would write down a lock file containing the current line number / index. So if your script crashes and you restart it the iteration simply skips every line until it's current index is higher than the index from the lock file.

the script

It might work but, in the end should not simply copy paste everything. But i hope it helps you finding your solution.

#!/usr/bin/env php
<?php
// I DID NOT TEST THIS! 
// but it should work.

$handle = fopen("path/to/the/html/file/containing/the/urls.html", "r");
$storage = "path/where/you/want/your/images/";
$lockFile = __DIR__.'/index.lock';
$index = 0;

// get the lock index
if ( !file_exists( $lockFile ) )
{
    file_put_contents( $lockFile, 0 );
}

// load the current index
$start = file_get_contents( $lockFile );

if ( $handle ) 
{
    // line by line step by step
    while ( ( $line = fgets( $handle ) ) !== false ) 
    {
        // update the 
        $index++;

        if ( $start > $index )
        {
            continue;
        }

        // match the url from the element
        preg_match( '/<a href="(.+)">/', $line, $url ); $url = $url[1];

        $file = basename( $url );

        // check if the file already exists 

        if ( !file_exists( $storage.$file )) //edited 
        {
            file_put_contents( $storage.$file, file_get_contents( $url ) );
        }

        // update the lock file
        file_put_contents( $lockFile, $index );
    }

    fclose($handle);
} 
else 
{
    throw new Exception( 'Could not open file.' );
} 

you can do something like this, of course you should also add here some error checking things :)

define("SITE_DIR", '/home/www/temp');

$file = file('in.txt');

foreach ($file AS $row){
    preg_match('/(?<=\")(.*?)(?=\")/', $row, $url);

    $path = parse_url($url[0], PHP_URL_PATH);
    $dirname = pathinfo($path, PATHINFO_DIRNAME);

    if (!is_dir(SITE_DIR . $dirname)){
        mkdir(SITE_DIR . $dirname, 0777, true);
    }

    file_put_contents(SITE_DIR. $path, file_get_contents($url[0]));
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM