简体   繁体   中英

Retrieving url from google image search for first entry, using python and selenium

Ever since the API has been deprecated, its been very hard to retrieve the google image search url using Selenium. I've scoured stackoverflow, but most of the results to this question are from years ago when scraping search engines was simpler.

Looking for a way to return the url of the first image in a google search query. I've used everything in selenium from clicks, to retrieve innerhtml of elements, to my most recent attempt, using actionchains to attempt to navigate to the url of the pic and then returning the current url.

def GoogleImager(searchterm, musedict):
    page = "http://www.google.com/"
    landing = driver.get(page)
    actions = ActionChains(driver)
    WebDriverWait(landing, '10')
    images = driver.find_element_by_link_text('Images').click()
    actions.move_to_element(images)
    searchbox = driver.find_element_by_css_selector('#lst-ib')
    WebDriverWait(searchbox, '10')



    sendsearch = searchbox.send_keys('{} "logo" {}'.format('Museum of Bad Art', 'bos')+Keys.ENTER)
    WebDriverWait(sendsearch, '10')
    logo = driver.find_element_by_xpath('//*[@id="rg_s"]/div[1]/a').click()
    WebDriverWait(logo, '10')
    logolink = driver.find_element_by_xpath('//*[@id="irc_cc"]/div[3]/div[1]/div[2]/div[2]/a')
    WebDriverWait(logolink, '10')
    actions.move_to_element(logolink).click(logolink)  

    print(driver.current_url)
    return driver.current_url

I'm using this to return the first image for a museum name and city in the search.

I tried to make your code work with Google, got frustrated and switched to Yahoo instead. I couldn't make heads or tails of your musedict access loops so I substituted a simple dictionary for demonstration purposes:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait

museum_dictionary = { "louvre": "Paris", "prado": "Madrid"}

driver = webdriver.Firefox()

def YahooImager(searchterm):
    page = "https://images.search.yahoo.com"

    landing = driver.get(page)

    WebDriverWait(driver, 4)

    assert "Yahoo Image Search" in driver.title

    searchbox = driver.find_element_by_name("p") # Find the query box

    city = museum_dictionary[searchterm]

    searchbox.send_keys("{} {}".format(searchterm, city) + Keys.RETURN)

    WebDriverWait(driver, 4)

    try:
        driver.find_element_by_xpath('//*[@id="resitem-0"]/a').click()
    except NoSuchElementException:
        assert 0, '//*[@id="resitem-0"]/a'
        driver.close()

    WebDriverWait(driver, 4)

    try:
        driver.find_element_by_link_text("View Image").click()
    except NoSuchElementException:
        assert 0, "View Image"
        driver.close()

    WebDriverWait(driver, 4)

    # driver.close()

    return driver.current_url

image_url = YahooImager("prado")

print(repr(image_url))

It works, but takes quite a while. (That's probably something someone who knows these libraries better could optimize -- I just wanted to see it work at all.) This example is fragile and occasionally just fails.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM