简体   繁体   中英

Could not extract text from python selenium

I have written following code to extract price detail the url.

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = ('Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.89 Safari/537.36')
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.get("https://www.walmart.com/ip/Fitness-Reality-TR3000-Maximum-Weight-Capacity-Manual-Treadmill-with-Pacer-Control-and-Heart-Rate-System/37455841#")
driver.find_element_by_css_selector("div[itemprop='price']:nth-of-type(1)").text

It is giving empty value although we have price details inside that particular tag.

When I tried extracting innerHTML of that tag instead of text using the following.

driver.find_element_by_css_selector("div[itemprop='price']:nth-of-type(1)").get_attribute("innerHTML")

I get these results

u' <span class="Price-sup">$</span>199<span class="Price-mark">.</span><span class="Price-sup">00</span> '

It clearly shows that I have text 199 inside the tag but I couldnt extract it. Am I missing anything here?

To get price you need following code:

price = driver.find_element_by_xpath('//div[@itemprop="price"]').text
# due to hidden element <span class="Price-mark">.</> 
# you need to specify floating point by yourself
# or you will get $19900 as result which is not what you expect 
price = '.'.join([price[:-2], price[-2:]])

Result: $199.00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM