使用 python 代码从链接下载图像

Question

I am downloading images from a link but I am facing some problems.我正在从链接下载图像，但我遇到了一些问题。 It shows "found 0 links" and then "downloaded 0 files".它显示“找到 0 个链接”，然后显示“下载 0 个文件”。

Here's the code:这是代码：

import urllib.request
import re
import os

#the directory to where save the images
DIRECTORY = "book"

#the url to fetch the html page where the images are
URL = "https://www.inaturalist.org/taxa/56061-Alliaria-petiolata/browse_photos"

#the regex to get the url to the images from the html page
REGEX = '(?<=<a href=")http://\d.bp.inaturalist.org/[^"]+'



#the prefix of the image file name
PREFIX = 'page_'

if not os.path.isdir(DIRECTORY):
    os.mkdir(DIRECTORY)

contents = urllib.request.urlopen(URL).read().decode('utf-8')
links = re.findall(REGEX, contents)

print("Found {} lnks".format(len(links)))
print("Starting download...")

page_number = 1
total = len(links)
downloaded = 0
for link in links:
    filename = "{}/{}{}.jpg".format(DIRECTORY, PREFIX, page_number)
    if not os.path.isfile(filename):
        urllib.request.urlretrieve(link, filename)
        downloaded = downloaded + 1
        print("done: {} ({}/{})".format(filename, downloaded, total))
    else:
        downloaded = downloaded + 1
        print("skip: {} ({}/{})".format(filename, downloaded, total))
    page_number = page_number + 1

print("Downloaded {} files".format(total))

How can I do it?我该怎么做？

Answer 1

I just fixed your regex and changed some logic.我刚刚修复了您的正则表达式并更改了一些逻辑。 This script should work properly:该脚本应该可以正常工作：

import urllib.request
import re
import os

#the directory to where save the images
DIRECTORY = "book"

#the url to fetch the html page where the images are
URL = "https://www.inaturalist.org/taxa/56061-Alliaria-petiolata/browse_photos"

#the regex to get the url to the images from the html page
REGEX = re.compile(r'(?:(?:https?)+\:\/\/+[a-zA-Z0-9\/\._-]{1,})+(?:(?:jpe?g|png|gif))')


#the prefix of the image file name
PREFIX = 'page_'

if not os.path.isdir(DIRECTORY):
    os.mkdir(DIRECTORY)

contents = urllib.request.urlopen(URL).read().decode('utf-8')
links = re.findall(REGEX, contents)

print("Found {} lnks".format(len(links)))
print("Starting download...")

page_number = 1
total = len(links)
downloaded = 0
page_number = 1
total = len(links)
downloaded = 0
for link in links:
    ext = link.split('.')[-1]
    filename = "{}/{}{}.{}".format(DIRECTORY, PREFIX, page_number, ext)
    urllib.request.urlretrieve(link, filename)
    downloaded = downloaded + 1
    print("done: {} ({}/{})".format(filename, downloaded, total))
    page_number = page_number + 1

print("Downloaded {} files".format(total))

By the way, I'd suggest you to use some library/framework for this job (eg Scrapy, BeautifulSoup etc)顺便说一句，我建议你使用一些库/框架来完成这项工作（例如 Scrapy、BeautifulSoup 等）

使用 python 代码从链接下载图像

问题描述

1 个解决方案

解决方案1
0 2022-06-27 14:12:22

使用 python 代码从链接下载图像

问题描述

1 个解决方案

解决方案1 0 2022-06-27 14:12:22

解决方案1
0 2022-06-27 14:12:22