繁体   English   中英

Python:需要从URL循环下载图像(JPG和PNG)

[英]Python: Need to download images from URL in a loop (JPG and PNG)

我的学校有电子书,但是要访问这些书,我需要先几次登录,这样才能花很长时间才能读到我的书。 最重要的是,我需要连接互联网才能看到它们。

但是,我发现这些书作为单独的图像(包括jpg和png)都在出版商服务器上,现在我想下载这些图像并将它们组合成PDF文件。

我遇到的问题是文件通常不可访问,因此我需要为脚本提供确切的URL,而我无法将其转到下一个文件。

这是我到目前为止的内容: Pastebin链接

import os
import urllib
import requests
import sys
from time import sleep
from PIL import Image
from reportlab.lib.utils import ImageReader
from reportlab.lib import utils
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import cm
from subprocess import Popen, PIPE
import shutil
import nltk  
from urllib import urlopen

#Change those variables:

URL = "http://cdpcontent.toegang.nu/c436b908-7a8d-49ce-ae5e-24892fa06fd7/20140808123622/extract/assets/img/layout/page-00"

#_____________________________________________________

FILE_END_JPG = ".jpg"
FILE_END_PNG = ".png"
SAVE_TO_DIRECTORY = "images"

NUM = 1 # Default 1
MAX_NUM = 500

builded_link_jpg = URL + str(NUM) + FILE_END_JPG
builded_link_png = URL + str(NUM) + FILE_END_PNG

def link_alive(some_url):
        try:
                html = urlopen(some_url).read()
                four_zero_four = "De door u gevraagde pagina of resource kan helaas niet worden gevonden."
                if four_zero_four in html:
                        #print "Link dead."
                        return 0
                else:
                        #print "Link alive."
                        return 1

        except Exception as Error:
                print Error
                print "\nError in check_dead_link function.\n"


def save(NUM, MAX_NUM, SAVE_TO_DIRECTORY, FILE_END_PNG, FILE_END_JPG, URL):
        save_name = 0

        try:
                if not os.path.exists(SAVE_TO_DIRECTORY):
                        os.makedirs(SAVE_TO_DIRECTORY)
                        print SAVE_TO_DIRECTORY + " created."

                print "All images will be saved to the folder:", SAVE_TO_DIRECTORY + "\n"

                while NUM <= MAX_NUM:

                                        if link_alive(builded_link_jpg) == 1:
                                                print "This is a JPG page\n"
                                                save_name = "%04d" % save_name
                                                image = str(save_name) + FILE_END_JPG
                                                save_name = int(save_name)
                                                save_name += 1
                                                urllib.urlretrieve(builded_link_jpg, SAVE_TO_DIRECTORY + "//" + image)
                                                NUM += 1
                                                print builded_link_jpg + " saved.\n"

                                        else:
                                                print "This is a PNG page\n"
                                                save_name = "%04d"% save_name
                                                image = str(save_name) + FILE_END_PNG
                                                save_name = int(save_name)
                                                save_name += 1
                                                urllib.urlretrieve(builded_link_jpg, SAVE_TO_DIRECTORY + "//" + image)
                                                NUM += 1
                                                print builded_link_jpg + " saved.\n"

                print "Done saving all the images!"


        except Exception as Error:
                print Error
                print "\nFail in save function.\n"

save(NUM, MAX_NUM, SAVE_TO_DIRECTORY, FILE_END_PNG, FILE_END_JPG, URL)

我卡住的部分是最后一次,它会继续下载相同的图片,但名称会增加; /

有人可以帮我吗?

您可能要在循环内更新builded_link_jpg的值!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM