简体   繁体   English

如何使用Selenium和Python下载图像

[英]How to download images with Selenium and Python

I'm trying to download some images (let's say the first 10) from a website. 我正在尝试从网站下载一些图像(假设是前10张)。 The problem is that i don't know how html works. 问题是我不知道html的工作方式。

What I did so far: 到目前为止,我做了什么:

from selenium import webdriver
import time

driver = webdriver.Chrome("C:\web_driver\chromedriver")
url = "https://9gag.com/"
driver.get(url)

time.sleep(5)
driver.find_element_by_xpath("/html/body/div[7]/div[1]/div[2]/div/div[3]/button[2]/span").click()

images = driver.find_elements_by_tag_name('img')
list = []
for image in images:
    print(image.get_attribute('src'))
    list.append(image.get_attribute('src'))

I want to download the images at the center of the page but the program just retrieve the images on the left sidebar. 我想在页面中心下载图像,但是该程序只是在左侧栏中检索图像。 My attempt to solve this problem is: 我试图解决此问题的尝试是:

from selenium import webdriver
import time

driver = webdriver.Chrome("C:\web_driver\chromedriver")
url = "https://9gag.com/"
driver.get(url)

time.sleep(5)


# this part is to close the cookies pop up
driver.find_element_by_xpath("/html/body/div[7]/div[1]/div[2]/div/div[3]/button[2]/span").click()

    images = driver.find_element_by_class_name("page").get_attribute("img")

    list = []
    for image in images:
        print(image.get_attribute('src'))
        # list.append(image.get_attribute('src'))
        # print("list:", list)
        time.sleep(1)

but I got the following error: 但出现以下错误:

Traceback (most recent call last):
  File "C:/Users/asus/PycharmProjects/project1/36.py", line 14, in <module>
    for image in images:
TypeError: 'NoneType' object is not iterable

Process finished with exit code 1
  1. the element <div class=page> doesn't contain any img attribute. 元素<div class=page>不包含任何img属性。 You have to look for the <img> tag 您必须寻找<img>标签
  2. find_element_by_ only returns one element. find_element_by_仅返回一个元素。 To get the list of elements you have to use find_elements_by_ . 要获取元素列表,您必须使用find_elements_by_ That is why you are getting the error. 这就是为什么您得到错误。
  3. To get the image from posts, you have to specify the images inside the posts. 要从帖子中获取图像,必须在帖子内部指定图像。 Try the following XPath for finding the images inside posts. 尝试使用以下XPath查找帖子中的图像。 //div[contains(@id,'stream-')]//div[@class='post-container']//picture/img
  4. Remember that the gif s are not image or inside an <image> tag. 请记住, gif不是图像,也不在<image>标记内。 So you will only be able to get the still images by this method. 因此,您只能通过这种方法获取静止图像。

Try this: 尝试这个:

images = driver.find_elements_by_xpath("//div[contains(@id,'stream-')]//div[@class='post-container']//picture/img")
list = []
for image in images:
    print(image.get_attribute('src'))
    list.append(image.get_attribute('src'))

It will put all the found images sources to the list. 它将所有找到的图像源放入列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM