简体   繁体   中英

How do I scrape just one specific image using Python Selenium?

I would like to scrape an image from a website and store it in a specified folder but all the tutorials out there only seem to teach how to scrape multiple images. For example, I would like to scrape this puppy image that can be seen right away from https://duckduckgo.com/?q=Puppy&t=h_&ia=web and save it on my desktop. How do I go on about this?

The codes that I have only figured out so far is:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

PATH = "C:\Coding\Codes\Python\edgedriver\msedgedriver.exe"
driver = webdriver.Edge(PATH)
driver.maximize_window()
driver.get("https://duckduckgo.com/")

searchbox = driver.find_element_by_id("search_form_input_homepage")
searchbox.send_keys("Puppy")
searchbox.send_keys(Keys.ENTER)

#then save the puppy's image to a specified folder, say inside C:\Users\John\Desktop

You can use urllib.request library

import urllib.request
from random import *
import random,string

sampleImage = driver.find_element_by_xpath('your xpath').get_attribute('src')
characters = 5
letters = string.ascii_lowercase
img_str = ''.join(random.choice(letters) for i in range(characters))
fullname = str(img_str) + '.jpg'
filepath = 'E:\\crawling\\IMG\\' + fullname
urllib.request.urlretrieve(sampleImage,filepath)
print(fullname)

I hope this will work out. I use random library for naming the image with random characters.

Here is the code if you want to loop over images

import urllib.request
from random import *
import random,string

j=1
imagename=[]
for images in driver.find_elements_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr'):
        sampleImage[j] = driver.find_element_by_xpath('//*[@id="w0"]/div[1]/div/div/div/div/div/div/div[1]/table/tbody/tr[%d]/td[1]/img' % (j,)).get_attribute('src')
        print(sampleImage[j])
        characters = 10
        letters = string.ascii_lowercase
        img_str = ''.join(random.choice(letters) for i in range(characters))
        fullname[j] = str(img_str) + '.jpg'
        filepath[j] = 'E:\\crawling\\IMG-FARAH\\' + fullname[j]
        urllib.request.urlretrieve(sampleImage[j],filepath[j])
        imagename.append(fullname[j])
        print(fullname[j])
        j=j+1   

I've also added the sample xpath and variable which would update after each count

To scrape the value of the src attribute of the only image, you can use either of the following Locator Strategies :

  • Using css_selector :

     print(driver.find_element(By.CSS_SELECTOR, "a.module__image>img").get_attribute("src"))
  • Using xpath :

     print(driver.find_element(By.XPATH, "//a[@class='module__image']/img").get_attribute("src"))

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies :

  • Using CSS_SELECTOR :

     print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "a.module__image>img"))).get_attribute("src"))
  • Using XPATH :

     print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//a[@class='module__image']/img"))).get_attribute("src"))
  • Console Output:

     https://duckduckgo.com/i/a49fa21e.jpg
  • Note : You have to add the following imports :

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM