簡體   English   中英

如何使用python 2.7從url下載所有圖像-問題

[英]How to download all images from url with python 2.7 - Problem

我嘗試使用以下代碼從“ https://www.nytimes.com/section/todayspaper ”下載所有圖像:

import requests
from io import open as iopen
from urlparse import urlsplit

file_url= 'https://www.nytimes.com/section/todayspaper'
def requests_image(file_url):
    suffix_list = ['jpg', 'gif', 'png', 'tif', 'svg',]
    file_name =  urlsplit(file_url)[2].split('/')[-1]
    file_suffix = file_name.split('.')[1]
    i = requests.get(file_url)
    if file_suffix in suffix_list and i.status_code == requests.codes.ok:
        with iopen(file_name, 'wb') as file:
            file.write(i.content)
    else:
        return False

運行它時沒有錯誤發生:

>>> 
>>> 

但是我不知道圖像在PC上的哪里下載?

我檢查了下載文件夾,但它們不存在。

如果要下載頁面中的所有圖像,應:

  • 下載網頁
  • 查找所有圖像標簽( <img>
  • 掃描所有圖像標簽並找到src屬性內容
  • 從已建立的鏈接下載所有文件

import os
import hashlib

import requests
from bs4 import BeautifulSoup


page_url = 'https://www.nytimes.com/section/todayspaper'

# Download page html 
page_data = requests.get(page_url).text

# Find all links in page
images_urls = [
    image.attrs.get('src')
    for image in BeautifulSoup(page_data, 'lxml').find_all('img')
]

# Clean empty links (<img src="" /> <img> etc)
images_urls = [
    image_url
    for image_url in images_urls
    if image_url and len(image_url)>0
]

# Download files
def download_image(source_url, dest_dir):
    # TODO: add filename extension
    image_name = hashlib.md5(source_url.encode()).hexdigest()

    with open(os.path.join(dest_dir, image_name), 'wb') as f:
        image_data = requests.get(source_url).content
        f.write(image_data)


for image_url in images_urls:
    download_image(image_url, './tmp')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM