简体   繁体   English

我想使用 Selenium Python 从谷歌搜索中获取前 10 张图片网址

[英]I want to get first 10 images url from google search using Selenium Python

I want to get first 10 images url from google search (not base64).我想从谷歌搜索(不是 base64)中获取前 10 个图像 url。 I have code:我有代码:

import os
import base64
import time

from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

searchterm = 'bananas'  # will also be the name of the folder
url = "https://www.google.com/search?q=banan&source=lnms&tbm=isch&sa=X&ved=2ahUKEwj-75rDlJLoAhWLHHcKHStFC6EQ_AUoAXoECA4QAw&biw=1867&bih=951"
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")
browser = webdriver.Chrome(executable_path=ChromeDriverManager().install(), options=options)
browser.get(url)
actions = webdriver.common.action_chains.ActionChains(browser)
header = {
    'User-Agent': "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"}
counter = 0
succounter = 0

if not os.path.exists(searchterm):
    os.mkdir(searchterm)

for i in range(0, 11):
    time.sleep(1)
    x = browser.find_elements_by_xpath('//*[@id="islrg"]/descendant::img')[i]
    x.click()
    i += 1
    if i > 10:
        break
    ba = browser.find_element_by_xpath('//* 
    [@id="Sva75c"]/div/div/div[3]/div[2]/div/div[1]/div[1]/div/div[2]/a/img')
    print(ba.get_attribute('src'))

It returns image urls, but sometimes base64.它返回图像 url,但有时返回 base64。 How to make the script always return image url?如何使脚本始终返回图像 url? Thank you.谢谢你。

Change the xpath to get the link rather image, and then get the href.更改 xpath 以获取链接而不是图像,然后获取 href。

ba = browser.find_element_by_xpath("//div[@class='islrc']//a[@href][@rel='noopener']")
print(ba.get_attribute("href")

You can always get only Image URLs if you scrape another search engine DuckDuckGo using the following code:如果您使用以下代码抓取另一个搜索引擎DuckDuckGo,您始终只能获得图像 URL:

search_query = 'what you want to find'
num_images = 10
driver_location = '/put/location/of/your/driver/here'

# setting up the driver
ser = Service(driver_location)
op = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=ser, options=op)

# searching the query
driver.get(f'https://duckduckgo.com/?q={search_query}&kl=us-en&ia=web')

# going to Images Section
ba = driver.find_element(By.XPATH, "//a[@class='zcm__link  js-zci-link  js-zci-link--images']")
ba.click()

# getting the images URLs
for result in driver.find_elements(By.CSS_SELECTOR, '.js-images-link')[0:0+num_images]:
    imageURL = result.get_attribute('data-id')

    print(f'{imageURL}\n')

driver.quit()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用python和selenium从google图像搜索中检索url作为第一项。 - Retrieving url from google image search for first entry, using python and selenium 我想使用 selenium python 从 API 响应中获取 orderID - I want to get orderID from API response using selenium python 如何使用 Selenium 和 Python 在 google 上点击第一个 google 搜索结果 - How to click on the first google search result on google using Selenium and Python 如何使用 Python 请求检索前 10 个 Google 搜索结果 - How to Retrieve 10 first Google Search Results Using Python Requests Selenium Python Google搜索,从结果中单击选择的URL - Selenium Python Google search, click chosen URL from result 如何使用 Selenium 和 Python 从谷歌搜索的“人们也问”部分中删除文本 - How do I scrap text from "People also ask" section from google search using Selenium and Python 我无法使用 python selenium 下载谷歌图片 - I can't download google images using python selenium 我想获取所有图像,但 selenium python 仅获取三个图像 - I want to get all of images but selenium python gets only three images 如何使用Selenium,Python从Google搜索中提取链接 - How to Pull Links from Google Search using Selenium, Python 如何使用 python 获得不是广告的 Google 搜索的第一个结果? - How can I get the first results of a Google Search that is not an ad using python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM