簡體   English   中英

如何在 python 中使用 selenium 從網站上抓取多個圖像並將它們保存在特定文件夾中?

[英]How do I scrape multiple images from a website and save them on a specific folder using selenium in python?

我想從網站的搜索結果中抓取圖像。 但是,如果沒有搜索結果,它將打印一個指示然后關閉瀏覽器。 例如,我想在https://gog-games.com/搜索 xcom。 它有三個結果,將這些結果保存在文件夾中並打印成功的指示,然后關閉瀏覽器。 我將如何繼續這件事? 到目前為止,這是我的代碼:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

search = input("What game are you looking for? ")
#if there is a result:
PATH = r"C:\Coding\Codes\Python\msedgedriver.exe"
driver = webdriver.Edge(PATH)
driver.maximize_window()
driver.get("https://gog-games.com/")
searchbox = driver.find_element_by_xpath("/html/body/header/div/form/input")
searchbox.send_keys(search)
searchbox.send_keys(Keys.ENTER)
#save each image at "C:\Coding\Codes\Python\Image_Dataset\image_name"
#close the browser
#print("Results saved")

#elif there is no result:
    #close the browser
    #print("There is no result")

您可以從 HTML 中的所有<img>獲取src並使用模塊requestsurllib.request來下載它們。

import os
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import requests

# --- constants ---

PATH = r"C:\Coding\Codes\Python\msedgedriver.exe"
FOLDER = r"C:\Coding\Codes\Python\Image_Dataset\image_name"
#FOLDER = 'test'

os.makedirs(FOLDER, exist_ok=True)

# ---

#search = input("What game are you looking for? ")
search = "Punk Wars"

driver = webdriver.Edge(PATH)
#driver = webdriver.Chrome()
#driver = webdriver.Firefox()
driver.maximize_window()

driver.get("https://gog-games.com/")

searchbox = driver.find_element_by_xpath('//form[@id="search-bar"]/input')
searchbox.send_keys(search)
searchbox.send_keys(Keys.ENTER)

time.sleep(3)

all_iamges = driver.find_elements_by_xpath('//div[@class="container search"]//img[@src]')

for img in all_iamges:
    url = img.get_attribute('src')
    filename = url.split("/")[-1]
    print('url:', url)
    print('filename:', filename)
    print('-----')
    
    full_path = os.path.join(FOLDER, filename)
    
    response = requests.get(url)
    with open(full_path, "wb") as fh:
        fh.write(response.content)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM