如何使用 python selenium 从 svg 中提取文本

Question

I'm trying to scrape the price from link: https://www.kbb.com/cadillac/deville/1996/sedan-4d/我正在尝试从链接中获取价格： https://www.kbb.com/cadillac/deville/1996/sedan-4d/

The prices are shown in text tag inside svg tag.价格显示在svg标签内的文本标签中。

When i use the xpath: .//*[name()='svg']//*[name()='g']//*[name()='text'] inside the browser's inspect element, I'm able to find the tags.当我在浏览器的检查元素中使用 xpath: .//*[name()='svg']//*[name()='g']//*[name()='text']时，我'能够找到标签。 But the same xpath is not working in the code.但是相同的 xpath 在代码中不起作用。

The current code is:当前代码是：

def get_price(url):
    driver.get(url)
    time.sleep(10)
    try:
        price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
    except:
        print("price not found")

    for p in price_tags:
        print(p.text)

I get a blank list in return of function find_elements_by_xpath when I run the above code.当我运行上述代码时，我得到一个空白列表以返回 function find_elements_by_xpath。 I tried other things as well like switching to default content because the element is in #document我尝试了其他事情以及切换到默认内容，因为该元素在#document

driver.switch_to_default_content()

but this also didn't work out well.但这也没有奏效。 If there is any other way to scrape price, please let me know.如果有其他方法可以刮价格，请告诉我。

Answer 1

It is external SVG and it seems Selenium doesn't have it in DOM so I had to get <object> which has url to this SVG file, get this url in data , download it using requests and get text using BeautifulSoup It is external SVG and it seems Selenium doesn't have it in DOM so I had to get <object> which has url to this SVG file, get this url in data , download it using requests and get text using BeautifulSoup

from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup

url = 'https://www.kbb.com/cadillac/deville/1996/sedan-4d/'

driver = webdriver.Firefox()
driver.get(url)
time.sleep(5)

# doesn't work - always empty list
#price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
#price_xpaths = driver.find_elements_by_xpath('//svg')
#price_xpaths = driver.find_elements_by_xpath('//svg//g//text')
#price_xpaths = driver.find_elements_by_xpath('//*[@id="PriceAdvisor"]')
#print(price_xpaths)  # always empty list

# single element `object`
svg_item = driver.find_element_by_xpath('//object[@id="PriceAdvisorFrame"]')

# doesn't work - always empty string
#print(svg_item.get_attribute('innerHTML'))

# get url to file SVG
svg_url = svg_item.get_attribute('data')
print(svg_url)  

# download it and parse
r = requests.get(svg_url)
soup = BeautifulSoup(r.content, 'html.parser')

text_items = soup.find_all('text')
for item  in text_items:
    print(item.text)

Result:结果：

Fair Market Range
$1,391 - $2,950
Fair Purchase Price
$2,171
Typical
Listing Price
$2,476

BTW: Information for other users: I had to use proxy/ VPN with IP located in US to see this page.顺便说一句：其他用户的信息：我必须使用代理/ VPN和位于US的 IP 才能看到这个页面。 For location PL it displays对于位置PL ，它显示

Access Denied. 
You don't have permission to access "http://www.kbb.com/cadillac/deville/1996/sedan-4d/" on this server.

Sometimes even for location in US it gives me this message.有时即使是在US的位置，它也会给我这个信息。

如何使用 python selenium 从 svg 中提取文本

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-05 12:56:24

如何使用 python selenium 从 svg 中提取文本

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-05 12:56:24

解决方案1
1 已采纳 2020-06-05 12:56:24