[英]How to extract text from svg using python selenium
我正在尝试从链接中获取价格: https://www.kbb.com/cadillac/deville/1996/sedan-4d/
价格显示在svg标签内的文本标签中。
当我在浏览器的检查元素中使用 xpath: .//*[name()='svg']//*[name()='g']//*[name()='text']
时,我'能够找到标签。 但是相同的 xpath 在代码中不起作用。
当前代码是:
def get_price(url):
driver.get(url)
time.sleep(10)
try:
price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
except:
print("price not found")
for p in price_tags:
print(p.text)
当我运行上述代码时,我得到一个空白列表以返回 function find_elements_by_xpath。 我尝试了其他事情以及切换到默认内容,因为该元素在#document
driver.switch_to_default_content()
但这也没有奏效。 如果有其他方法可以刮价格,请告诉我。
It is external SVG
and it seems Selenium doesn't have it in DOM so I had to get <object>
which has url to this SVG
file, get this url in data
, download it using requests
and get text using BeautifulSoup
from selenium import webdriver
import time
import requests
from bs4 import BeautifulSoup
url = 'https://www.kbb.com/cadillac/deville/1996/sedan-4d/'
driver = webdriver.Firefox()
driver.get(url)
time.sleep(5)
# doesn't work - always empty list
#price_xpaths = driver.find_elements_by_xpath(".//*[name()='svg']//*[name()='g']//*[name()='text']")
#price_xpaths = driver.find_elements_by_xpath('//svg')
#price_xpaths = driver.find_elements_by_xpath('//svg//g//text')
#price_xpaths = driver.find_elements_by_xpath('//*[@id="PriceAdvisor"]')
#print(price_xpaths) # always empty list
# single element `object`
svg_item = driver.find_element_by_xpath('//object[@id="PriceAdvisorFrame"]')
# doesn't work - always empty string
#print(svg_item.get_attribute('innerHTML'))
# get url to file SVG
svg_url = svg_item.get_attribute('data')
print(svg_url)
# download it and parse
r = requests.get(svg_url)
soup = BeautifulSoup(r.content, 'html.parser')
text_items = soup.find_all('text')
for item in text_items:
print(item.text)
结果:
Fair Market Range
$1,391 - $2,950
Fair Purchase Price
$2,171
Typical
Listing Price
$2,476
顺便说一句:其他用户的信息:我必须使用代理/ VPN和位于US
的 IP 才能看到这个页面。 对于位置PL
,它显示
Access Denied.
You don't have permission to access "http://www.kbb.com/cadillac/deville/1996/sedan-4d/" on this server.
有时即使是在US
的位置,它也会给我这个信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.