如何在python中保存來自網站的所有圖像

Question

對於我的圖像處理實踐，我想要一些來自此站點的圖像： https : //511ny.org/cctv並且似乎我無法訪問它們的“src”以在 BeautifulSoup 中使用並提取圖像。 如果您對此問題有任何解決方案，請通知我。 這是我的代碼，沒有得到任何響應：

from bs4 import BeautifulSoup
from urllib.request import urlopen

response = urlopen('https://511ny.org/cctv')
soup = BeautifulSoup(response, 'html.parser')
pics = soup.findAll('img')
for pic in pics:
    print('img src: ', pic['src'])

我跟進了另一個解決方案，即直接從網站下載所有圖像，但我找不到任何使用 python 的教程。

Answer 1

本網站中的圖像不存在於初始 html 文件中，而是通過執行 javascript 動態加載的，beautifulsoup/urllib 不會為您執行它們。

要抓取動態網站，您應該使用一種像selenium這樣的無頭瀏覽器，它有 python 庫與之通信。 這些瀏覽器就像普通瀏覽器一樣，但有一個區別； 它們由您的代碼而不是用戶控制。

selenium 的更好替代品是 puppeteer，但我在 node.js 中使用了它，我不確定它的 python 綁定質量。

Answer 2

你好，我這樣做了，我為每個圖像創建了 Xpath，然后獲取源代碼

import requests
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import urllib.request


PATH=r'C:\Program Files (x86)\chromedriver.exe'
driver= webdriver.Chrome(PATH)
page=driver.get(r'https://511ny.org/cctv')

try:
    main = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.XPATH, '//*[@id="cctvTable"]/tbody'))
    ) # I used XPATH of the table
    print (main.text)
except:
    driver.quit()

items=main.find_elements_by_tag_name('tr') # I use tr tag

for item in items:
    # print(item.text)
    #Get id
    identificador=item.get_attribute('data-id') 
    
    #Creating xpath and getting the image
    xpath='//*[@id="{}img"]'.format(identificador) 
    imagen=item.find_elements_by_xpath(xpath)[0]
    src=imagen.get_attribute('src')  
    urllib.request.urlretrieve(src,'{}.jpg'.format(identificador))

謝謝

如何在python中保存來自網站的所有圖像

問題描述

2 個解決方案

解決方案1
1 2020-10-24 21:13:19

解決方案2
1 已采納 2020-10-24 21:51:16

如何在python中保存來自網站的所有圖像

問題描述

2 個解決方案

解決方案1 1 2020-10-24 21:13:19

解決方案2 1 已采納 2020-10-24 21:51:16

解決方案1
1 2020-10-24 21:13:19

解決方案2
1 已采納 2020-10-24 21:51:16