简体   繁体   English

通过php一次性将图像从URl列表复制到我的服务器

[英]Copying Images from URl list to my server all at once by php

I have big list of Urls in html file for images something like this : 我在html文件中有大量的Urls列表,用于这样的图像:

<a href="http://example.com/image1.jpg">image1</a>
<a href="http://example.com/image2.jpg">image2</a>
<a href="http://example.com/image3.jpg">image3</a>
<a href="http://example.com/image4.jpg">image4</a>
<a href="http://example.com/image5.jpg">image5</a>
<a href="http://example.com/image6.jpg">image6</a>
<a href="http://example.com/image7.jpg">image7</a>

Around 50,000 Image 大约50,000张图片

I want to make small script that can copy all images to my server so i can have them in : 我想制作一个小脚本,可以将所有图像复制到我的服务器,所以我可以将它们放入:

http://Mywebsite.com/images/image1.jpg
http://Mywebsite.com/images/image1.jpg
http://Mywebsite.com/images/image1.jpg
...

I want to make loop and each Url in the list must be deleted after the image is copied successfully because sometimes if page crush on loading or something i can continue my loop without overwriting or reading again , if there is a better solution to not overwrite and read the url again please tell me. 我想制作循环,并且必须在成功复制图像后删除列表中的每个Url,因为有时如果页面压缩加载或者我可以继续循环而不会覆盖或再次读取,如果有更好的解决方案不覆盖和再次阅读网址请告诉我。

I would create a script that reads your html file line per line. 我会创建一个脚本,每行读取你的html文件行。
You can do that using fopen and fgets . 你可以使用fopenfgets来做到这一点。

fopen("path/to/some/file", "r");
while ( ( $line = fgets( $handle ) ) !== false ) 
{
    // do somehting with $line
}

This way the file gets not simply parsed into memory, so you don't have to worry about size 这样,文件不会简单地解析到内存中,因此您不必担心大小

Then after parsing every line I would write down a lock file containing the current line number / index. 然后解析每一行后,我会写下一个包含当前行号/索引的锁文件。 So if your script crashes and you restart it the iteration simply skips every line until it's current index is higher than the index from the lock file. 因此,如果您的脚本崩溃并重新启动它,迭代只会跳过每一行,直到它的当前索引高于锁定文件的索引。

the script 剧本

It might work but, in the end should not simply copy paste everything. 它可能会工作,但最终不应该简单地复制粘贴一切。 But i hope it helps you finding your solution. 但我希望它可以帮助您找到解决方案。

#!/usr/bin/env php
<?php
// I DID NOT TEST THIS! 
// but it should work.

$handle = fopen("path/to/the/html/file/containing/the/urls.html", "r");
$storage = "path/where/you/want/your/images/";
$lockFile = __DIR__.'/index.lock';
$index = 0;

// get the lock index
if ( !file_exists( $lockFile ) )
{
    file_put_contents( $lockFile, 0 );
}

// load the current index
$start = file_get_contents( $lockFile );

if ( $handle ) 
{
    // line by line step by step
    while ( ( $line = fgets( $handle ) ) !== false ) 
    {
        // update the 
        $index++;

        if ( $start > $index )
        {
            continue;
        }

        // match the url from the element
        preg_match( '/<a href="(.+)">/', $line, $url ); $url = $url[1];

        $file = basename( $url );

        // check if the file already exists 

        if ( !file_exists( $storage.$file )) //edited 
        {
            file_put_contents( $storage.$file, file_get_contents( $url ) );
        }

        // update the lock file
        file_put_contents( $lockFile, $index );
    }

    fclose($handle);
} 
else 
{
    throw new Exception( 'Could not open file.' );
} 

you can do something like this, of course you should also add here some error checking things :) 你可以做这样的事情,当然你也应该在这里添加一些错误检查:)

define("SITE_DIR", '/home/www/temp');

$file = file('in.txt');

foreach ($file AS $row){
    preg_match('/(?<=\")(.*?)(?=\")/', $row, $url);

    $path = parse_url($url[0], PHP_URL_PATH);
    $dirname = pathinfo($path, PATHINFO_DIRNAME);

    if (!is_dir(SITE_DIR . $dirname)){
        mkdir(SITE_DIR . $dirname, 0777, true);
    }

    file_put_contents(SITE_DIR. $path, file_get_contents($url[0]));
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM