简体   繁体   中英

How to extract the text $7.56 from the webpage using Selenium through Python

  1. Go to: https://www.goodrx.com/amoxicillin
  2. Right click on $7.56 (or any price) -> copy xpath in the chrome dev tools

I've tried all these variations:

find_element(By.XPATH, '// *[ @ id = "uat-price-row-coupon-1"] / div[3] / div[1] / text()')  
find_element(By.XPATH, "//*[@id='uat-price-row-coupon-0']/div[3]/div[1]/text()")  
find_element_by_xpath("//*[@id='uat-price-row-coupon-1']/div[3]/div[1]/text()")  

I also verified it works in "Try Xpath" in Firefox

But I get "no such element" from selenium with all of them.

Am I missing something?

Use WebDriverWait to wait elements visibility. The website has bot protection, be ready for captcha.

import re
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# ...

wait = WebDriverWait(driver, 20)
with driver:
    driver.get("https://www.goodrx.com/amoxicillin")

    rows = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'li[data-qa="price_row"]')))
    for row in rows:
        store_name = row.find_element_by_css_selector('[class^="goldAddUnderline"]').text.strip()
        drug_price = row.find_element_by_css_selector('[data-qa="drug_price"]').text.strip()
        drug_price = re.findall(r"\d+.\d+", drug_price)[0]
        print(store_name, drug_price)

To extract the text $7.56 as it is a text node you you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies :

  • Using CSS_SELECTOR :

     driver.get('https://www.goodrx.com/amoxicillin') element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul[aria-label='List of best coupons by price and pharmacy.']>li div[data-qa='drug_price']"))) print(driver.execute_script('return arguments[0].childNodes[1].textContent;', element).strip())
  • Using XPATH :

     driver.get('https://www.goodrx.com/amoxicillin') element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@aria-label='List of best coupons by price and pharmacy.']/li//div[@data-qa='drug_price']"))) print(driver.execute_script('return arguments[0].childNodes[1].textContent;', element).strip())
  • Console Output:

     $7.56
  • Note : You have to add the following imports :

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM