简体   繁体   中英

How to extract text from svg using python selenium

I'm trying to scrape the price from link: https://www.kbb.com/cadillac/deville/1996/sedan-4d/ img 显示价格范围

The prices are shown in text tag inside svg tag.

When i use the xpath: .//*[name()='svg']//*[name()='g']//*[name()='text'] inside the browser's inspect element, I'm able to find the tags. But the same xpath is not working in the code.

The current code is:

def get_price(url):
    driver.get(url)
    time.sleep(10)
    try:
        price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
    except:
        print("price not found")

    for p in price_tags:
        print(p.text)

I get a blank list in return of function find_elements_by_xpath when I run the above code. I tried other things as well like switching to default content because the element is in #document

driver.switch_to_default_content()

but this also didn't work out well. If there is any other way to scrape price, please let me know.

It is external SVG and it seems Selenium doesn't have it in DOM so I had to get <object> which has url to this SVG file, get this url in data , download it using requests and get text using BeautifulSoup

from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup

url = 'https://www.kbb.com/cadillac/deville/1996/sedan-4d/'

driver = webdriver.Firefox()
driver.get(url)
time.sleep(5)

# doesn't work - always empty list
#price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
#price_xpaths = driver.find_elements_by_xpath('//svg')
#price_xpaths = driver.find_elements_by_xpath('//svg//g//text')
#price_xpaths = driver.find_elements_by_xpath('//*[@id="PriceAdvisor"]')
#print(price_xpaths)  # always empty list

# single element `object`
svg_item = driver.find_element_by_xpath('//object[@id="PriceAdvisorFrame"]')

# doesn't work - always empty string
#print(svg_item.get_attribute('innerHTML'))

# get url to file SVG
svg_url = svg_item.get_attribute('data')
print(svg_url)  

# download it and parse
r = requests.get(svg_url)
soup = BeautifulSoup(r.content, 'html.parser')

text_items = soup.find_all('text')
for item  in text_items:
    print(item.text)

Result:

Fair Market Range
$1,391 - $2,950
Fair Purchase Price
$2,171
Typical
Listing Price
$2,476

在此处输入图像描述


BTW: Information for other users: I had to use proxy/ VPN with IP located in US to see this page. For location PL it displays

Access Denied. 
You don't have permission to access "http://www.kbb.com/cadillac/deville/1996/sedan-4d/" on this server.

Sometimes even for location in US it gives me this message.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM