简体   繁体   English

从 html 下载图像并保留文件夹结构

[英]Download images from html and keep the folder structure

I need to have download over 100.000 pictures.我需要下载超过 100.000 张图片。 The Pictures have: .png, .jpg, .jpeg, .gif format.图片有:.png、.jpg、.jpeg、.gif 格式。 I have the approval to use those pictures.我有权使用这些图片。 they have provide me an XML file with all the url`s他们为我提供了一个包含所有 url 的 XML 文件

The url have the structure url 具有结构

otherdomain/productimages/code/imagename.jpg/.png/.gif其他域/productimages/code/imagename.jpg/.png/.gif

I have all the codes in an php array called $codes[] I have also the full path of the all images on an array $images[]我有一个名为$codes[]的 php 数组中的所有代码我还有一个数组$images[]上所有图像的完整路径

I need to have all those pictures downloaded and keep the same structure我需要下载所有这些图片并保持相同的结构

mydomain/productimages/code/imagename.jpg/.png/.gif mydomain/productimages/code/imagename.jpg/.png/.gif

What i have so far due my reasearch over internet is:到目前为止,我通过互联网进行的研究是:

Looping over all the pages ( each hotel code )循环遍历所有页面(每个酒店代码)

   $i = 1;
   $r = 100000;

while ($i < $r) {
    $html = get_data('http://otherdomain.com/productimages/'.$codes[$i].'/');
    getImages($html);
    $codes[$i++];
}

    function getImages($html) {
        $matches = array();
        $regex = '~http://otherdomain.com/productimages/(.*?)\.jpg~i';
        preg_match_all($regex, $html, $matches);
        foreach ($matches[1] as $img) {
            saveImg($img);
        }
    }

    function saveImg($name) {
        $url = 'http://otherdomain.com/productimages/'.$name.'.jpg';
        $data = get_data($url);
        file_put_contents('photos/'.$name.'.jpg', $data);
    }

Could you help me to get this working as the script doesnt work at all你能帮我解决这个问题吗,因为脚本根本不起作用

I may suggest you the easier and faster approach to the task. 我可能会建议您更轻松,更快速地完成任务。 Write a complete URLs to the list.txt execute wget -x -i list.txt command which will download all the images and put them in appropriate directories according to the site structure. 将完整的URL写入list.txt。执行wget -x -i list.txt命令,该命令将下载所有图像并将其根据站点结构放置在适当的目录中。

In response to作为回应

it works very fine, does it happend to know if i can set the wget to download all the files to a certain location eg HTTP root folder?它工作得很好,是否知道我是否可以设置 wget 将所有文件下载到某个位置,例如 HTTP 根文件夹?

The wget downloads to the folder it is being run, so you could just cd to the folder and do the wget in there. wget 下载到它正在运行的文件夹中,因此您只需 cd 到该文件夹并在其中执行 wget。

Also, to complement @Hlorofos answer, you can use -nH, so the folder structure doesn't include the host URL.此外,为了补充@Hlorofos 的答案,您可以使用-nH,因此文件夹结构不包括主机 URL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM