简体   繁体   English

如何使用 python selenium 从 svg 中提取文本

[英]How to extract text from svg using python selenium

I'm trying to scrape the price from link: https://www.kbb.com/cadillac/deville/1996/sedan-4d/我正在尝试从链接中获取价格: https://www.kbb.com/cadillac/deville/1996/sedan-4d/ img 显示价格范围

The prices are shown in text tag inside svg tag.价格显示在svg标签内的文本标签中。

When i use the xpath: .//*[name()='svg']//*[name()='g']//*[name()='text'] inside the browser's inspect element, I'm able to find the tags.当我在浏览器的检查元素中使用 xpath: .//*[name()='svg']//*[name()='g']//*[name()='text']时,我'能够找到标签。 But the same xpath is not working in the code.但是相同的 xpath 在代码中不起作用。

The current code is:当前代码是:

def get_price(url):
    driver.get(url)
    time.sleep(10)
    try:
        price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
    except:
        print("price not found")

    for p in price_tags:
        print(p.text)

I get a blank list in return of function find_elements_by_xpath when I run the above code.当我运行上述代码时,我得到一个空白列表以返回 function find_elements_by_xpath。 I tried other things as well like switching to default content because the element is in #document我尝试了其他事情以及切换到默认内容,因为该元素在#document

driver.switch_to_default_content()

but this also didn't work out well.但这也没有奏效。 If there is any other way to scrape price, please let me know.如果有其他方法可以刮价格,请告诉我。

It is external SVG and it seems Selenium doesn't have it in DOM so I had to get <object> which has url to this SVG file, get this url in data , download it using requests and get text using BeautifulSoup It is external SVG and it seems Selenium doesn't have it in DOM so I had to get <object> which has url to this SVG file, get this url in data , download it using requests and get text using BeautifulSoup

from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup

url = 'https://www.kbb.com/cadillac/deville/1996/sedan-4d/'

driver = webdriver.Firefox()
driver.get(url)
time.sleep(5)

# doesn't work - always empty list
#price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
#price_xpaths = driver.find_elements_by_xpath('//svg')
#price_xpaths = driver.find_elements_by_xpath('//svg//g//text')
#price_xpaths = driver.find_elements_by_xpath('//*[@id="PriceAdvisor"]')
#print(price_xpaths)  # always empty list

# single element `object`
svg_item = driver.find_element_by_xpath('//object[@id="PriceAdvisorFrame"]')

# doesn't work - always empty string
#print(svg_item.get_attribute('innerHTML'))

# get url to file SVG
svg_url = svg_item.get_attribute('data')
print(svg_url)  

# download it and parse
r = requests.get(svg_url)
soup = BeautifulSoup(r.content, 'html.parser')

text_items = soup.find_all('text')
for item  in text_items:
    print(item.text)

Result:结果:

Fair Market Range
$1,391 - $2,950
Fair Purchase Price
$2,171
Typical
Listing Price
$2,476

在此处输入图像描述


BTW: Information for other users: I had to use proxy/ VPN with IP located in US to see this page.顺便说一句:其他用户的信息:我必须使用代理/ VPN和位于US的 IP 才能看到这个页面。 For location PL it displays对于位置PL ,它显示

Access Denied. 
You don't have permission to access "http://www.kbb.com/cadillac/deville/1996/sedan-4d/" on this server.

Sometimes even for location in US it gives me this message.有时即使是在US的位置,它也会给我这个信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM