简体   繁体   中英

How to download images with Selenium and Python

I'm trying to download some images (let's say the first 10) from a website. The problem is that i don't know how html works.

What I did so far:

from selenium import webdriver
import time

driver = webdriver.Chrome("C:\web_driver\chromedriver")
url = "https://9gag.com/"
driver.get(url)

time.sleep(5)
driver.find_element_by_xpath("/html/body/div[7]/div[1]/div[2]/div/div[3]/button[2]/span").click()

images = driver.find_elements_by_tag_name('img')
list = []
for image in images:
    print(image.get_attribute('src'))
    list.append(image.get_attribute('src'))

I want to download the images at the center of the page but the program just retrieve the images on the left sidebar. My attempt to solve this problem is:

from selenium import webdriver
import time

driver = webdriver.Chrome("C:\web_driver\chromedriver")
url = "https://9gag.com/"
driver.get(url)

time.sleep(5)


# this part is to close the cookies pop up
driver.find_element_by_xpath("/html/body/div[7]/div[1]/div[2]/div/div[3]/button[2]/span").click()

    images = driver.find_element_by_class_name("page").get_attribute("img")

    list = []
    for image in images:
        print(image.get_attribute('src'))
        # list.append(image.get_attribute('src'))
        # print("list:", list)
        time.sleep(1)

but I got the following error:

Traceback (most recent call last):
  File "C:/Users/asus/PycharmProjects/project1/36.py", line 14, in <module>
    for image in images:
TypeError: 'NoneType' object is not iterable

Process finished with exit code 1
  1. the element <div class=page> doesn't contain any img attribute. You have to look for the <img> tag
  2. find_element_by_ only returns one element. To get the list of elements you have to use find_elements_by_ . That is why you are getting the error.
  3. To get the image from posts, you have to specify the images inside the posts. Try the following XPath for finding the images inside posts. //div[contains(@id,'stream-')]//div[@class='post-container']//picture/img
  4. Remember that the gif s are not image or inside an <image> tag. So you will only be able to get the still images by this method.

Try this:

images = driver.find_elements_by_xpath("//div[contains(@id,'stream-')]//div[@class='post-container']//picture/img")
list = []
for image in images:
    print(image.get_attribute('src'))
    list.append(image.get_attribute('src'))

It will put all the found images sources to the list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM