How to extract the text from Google features in Python?

Question

By Google features I mean for example when you type in to Google "I'm feeling curious" and the first result is a random fact, after that you get the basic results. What I'm trying to do is to extract the random fact's text in Python. I tried using libraries requests and bs4 . I noticed that the random fact feature can't be found with requests library.

Is there some other way to extract the text?

Answer 1

The text could be extracted via UI with Selenium WebDriver and Python. But, selectors won't be stable due to changed classes name with every page loading. For example, xpath to get text of the question will be like //*[@id="rso"]/div/div/div/div/div/div/div/div/div[1]/div .

BTW, it's possible. Look at the example below:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 2})
browser = webdriver.Chrome(chrome_options=chrome_options)

browser.get("https://www.google.com")
search_box= browser.find_element_by_id("lst-ib")
search_box.send_keys("I'm feeling curious")
search_box.submit()
wait = WebDriverWait(browser, 5)
question = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="rso"]/div/div/div/div/div/div/div/div/div[1]/div')))
answer = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="rso"]/div/div/div/div/div/div/div/div/div[2]/div')))
from time import sleep
count = 3
while not answer.text:
    if not count: break
    sleep(1)
    answer = browser.find_element_by_xpath('//*[@id="rso"]/div/div/div/div/div/div/div/div/div[2]/div')
url = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="rso"]/div/div/div/div/div/div/div/div/div/p/a'))).get_attribute('href')

print('Question: {} \nAnswer: {}\nUrl: {}'.format(question.text, answer.text, url))

You can run this code if you install Selenium, and others dependencies if will be needed.

How to extract the text from Google features in Python?

Question

1 answers

solution1
0 ACCPTED 2018-07-11 08:33:17

How to extract the text from Google features in Python?

Question

1 answers

solution1 0 ACCPTED 2018-07-11 08:33:17

solution1
0 ACCPTED 2018-07-11 08:33:17