将所有图像下载到Web目录中

Question

我正在尝试使用BeautifulSoup4在我的Web服务器上的特定目录中收集所有图像。

到目前为止，我已经收到了这段代码，

from init import *
from bs4 import BeautifulSoup
import urllib
import urllib.request
# use this image scraper from the location that 
#you want to save scraped images to

def make_soup(url):
    html = urllib.request.urlopen(url)
    return BeautifulSoup(html, features="html.parser")

def get_images(url):
    soup = make_soup(url)
    #this makes a list of bs4 element tags
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + "images found.")
    print ('Downloading images to current working directory.')
    #compile our unicode list of image links
    image_links = [each.get('src') for each in images]
    for each in image_links:
        filename=each.split('/')[-1]
        urllib.request.Request(each, filename)
    return image_links

#a standard call looks like this
get_images('https://omabilder.000webhostapp.com/img/')

但是，这会吐出以下错误

7images found.
Downloading images to current working directory.
Traceback (most recent call last):
  File "C:\Users\MyPC\Desktop\oma projekt\getpics.py", line 1, in <module>
    from init import *
  File "C:\Users\MyPC\Desktop\oma projekt\init.py", line 9, in <module>
    from getpics import *
  File "C:\Users\MyPC\Desktop\oma projekt\getpics.py", line 26, in <module>
    get_images('https://omabilder.000webhostapp.com/img/')
  File "C:\Users\MyPC\Desktop\oma projekt\getpics.py", line 22, in get_images
    urllib.request.Request(each, filename)
  File "C:\Users\MyPC\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 328, in __init__
    self.full_url = url
  File "C:\Users\MyPC\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 354, in full_url
    self._parse()
  File "C:\Users\MyPC\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 383, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: '/icons/blank.gif'

我不明白的是以下内容，

目录中没有GIF ，也没有/icon/子目录。 此外，当只有3张图片上传到网站时，它吐出7张图片。

Answer 1

gif是网站上链接旁边的图标（很小的〜20x20 px图像）。 它们实际上显示在网站上。 如果我理解正确，则您想下载png图片-这些是链接，而不是您提供的url中的图片。

如果要从链接下载png图像，则可以使用如下所示的内容：

from bs4 import BeautifulSoup
import urllib
import urllib.request
import os
# use this image scraper from the location that 
#you want to save scraped images to

def make_soup(url):
    html = urllib.request.urlopen(url)
    return BeautifulSoup(html, features="html.parser")

def get_images(url):
    soup = make_soup(url)
    # get all links (start with "a")
    images  = [link["href"] for link in soup.find_all('a', href=True)]
    # keep ones that end with png
    images = [im for im in images if im.endswith(".png")]    
    print (str(len(images)) + " images found.")
    print ('Downloading images to current working directory.')
    #compile our unicode list of image links
    for each in images:
        urllib.request.urlretrieve(os.path.join(url, each), each)
    return images

# #a standard call looks like this
get_images('https://omabilder.000webhostapp.com/img/')

将所有图像下载到Web目录中

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-09-18 20:45:54

将所有图像下载到Web目录中

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-09-18 20:45:54

解决方案1
1 已采纳 2019-09-18 20:45:54