![](/img/trans.png)
[英]How to download Flickr images using photos url (does not contain .jpg, .png, etc.) using Python
[英]Python: Need to download images from URL in a loop (JPG and PNG)
我的学校有电子书,但是要访问这些书,我需要先几次登录,这样才能花很长时间才能读到我的书。 最重要的是,我需要连接互联网才能看到它们。
但是,我发现这些书作为单独的图像(包括jpg和png)都在出版商服务器上,现在我想下载这些图像并将它们组合成PDF文件。
我遇到的问题是文件通常不可访问,因此我需要为脚本提供确切的URL,而我无法将其转到下一个文件。
这是我到目前为止的内容: Pastebin链接
import os
import urllib
import requests
import sys
from time import sleep
from PIL import Image
from reportlab.lib.utils import ImageReader
from reportlab.lib import utils
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
from reportlab.lib.units import cm
from subprocess import Popen, PIPE
import shutil
import nltk
from urllib import urlopen
#Change those variables:
URL = "http://cdpcontent.toegang.nu/c436b908-7a8d-49ce-ae5e-24892fa06fd7/20140808123622/extract/assets/img/layout/page-00"
#_____________________________________________________
FILE_END_JPG = ".jpg"
FILE_END_PNG = ".png"
SAVE_TO_DIRECTORY = "images"
NUM = 1 # Default 1
MAX_NUM = 500
builded_link_jpg = URL + str(NUM) + FILE_END_JPG
builded_link_png = URL + str(NUM) + FILE_END_PNG
def link_alive(some_url):
try:
html = urlopen(some_url).read()
four_zero_four = "De door u gevraagde pagina of resource kan helaas niet worden gevonden."
if four_zero_four in html:
#print "Link dead."
return 0
else:
#print "Link alive."
return 1
except Exception as Error:
print Error
print "\nError in check_dead_link function.\n"
def save(NUM, MAX_NUM, SAVE_TO_DIRECTORY, FILE_END_PNG, FILE_END_JPG, URL):
save_name = 0
try:
if not os.path.exists(SAVE_TO_DIRECTORY):
os.makedirs(SAVE_TO_DIRECTORY)
print SAVE_TO_DIRECTORY + " created."
print "All images will be saved to the folder:", SAVE_TO_DIRECTORY + "\n"
while NUM <= MAX_NUM:
if link_alive(builded_link_jpg) == 1:
print "This is a JPG page\n"
save_name = "%04d" % save_name
image = str(save_name) + FILE_END_JPG
save_name = int(save_name)
save_name += 1
urllib.urlretrieve(builded_link_jpg, SAVE_TO_DIRECTORY + "//" + image)
NUM += 1
print builded_link_jpg + " saved.\n"
else:
print "This is a PNG page\n"
save_name = "%04d"% save_name
image = str(save_name) + FILE_END_PNG
save_name = int(save_name)
save_name += 1
urllib.urlretrieve(builded_link_jpg, SAVE_TO_DIRECTORY + "//" + image)
NUM += 1
print builded_link_jpg + " saved.\n"
print "Done saving all the images!"
except Exception as Error:
print Error
print "\nFail in save function.\n"
save(NUM, MAX_NUM, SAVE_TO_DIRECTORY, FILE_END_PNG, FILE_END_JPG, URL)
我卡住的部分是最后一次,它会继续下载相同的图片,但名称会增加; /
有人可以帮我吗?
您可能要在循环内更新builded_link_jpg
的值!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.