[英]Python Selenium get attribute 'href' error
I am trying to get href from the link, please find my codes:我正在尝试从链接中获取 href,请找到我的代码:
url ='http://money.finance.sina.com.cn/bond/notice/sz149412.html'
link = driver.find_element_by_xpath("//div[@class='blk01'])//ul//li[3]//a[contains(text(),'发行信息']").get_attribute('href')
print(link)
error错误
invalid selector: Unable to locate an element with the xpath expression
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//div[@class='blk01'])//ul/li[3]//a[contains(text(),'发行信息']' is not a valid XPath expression.
Seems it is not a valid xpath, but I cannot figure out the error, any help will be appreciated!似乎它不是有效的 xpath,但我无法找出错误,任何帮助将不胜感激!
Thanks谢谢
try this instead:试试这个:
link = driver.find_element_by_xpath('//div[@class="blk01"]//ul//li[3]//a[contains(text(), "发行信息")]')
print(link.get_attribute("href"))
//a[contains(text(),'发行信息')]
Even this would work.即使这样也行。
print(link.get_attribute("href"))
# Importing necessary modules
from seleniumwire import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
# WebDriver Chrome
driver = webdriver.Chrome(ChromeDriverManager().install())
# Target URL
url = 'http://money.finance.sina.com.cn/bond/notice/sz149412.html'
driver.get(url)
time.sleep(5)
link = driver.find_element_by_xpath('//*[@class="blue" and contains(text(),"发行信息")]').get_attribute('href')
print(link)
//div[@class='blk01'])//ul//li[3]//a[contains(text(),'发行信息']
does not seem to be a stable xpath and also you mess up with ' and " . This is the main problem.似乎不是一个稳定的 xpath 并且你搞砸了 ' 和 " 。这是主要问题。
Try this first:先试试这个:
find_element_by_xpath('//div[@class="blk01"])//ul//li[3]//a[contains(text(),"发行信息"]')
If it works, try just:如果有效,请尝试:
find_element_by_xpath('//a[contains(text(),"发行信息"]')
The goal is to make xpath
as short as possible.目标是使
xpath
尽可能短。
Any particular reason to use Selenium here?在这里使用 Selenium 有什么特别的理由吗? It's present in the html source, so would be more efficient to use
requests
and beautifulsoup
.它存在于 html 源中,因此使用
requests
和beautifulsoup
会更有效。
import requests
from bs4 import BeautifulSoup
url = 'http://money.finance.sina.com.cn/bond/notice/sz149412.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
a_tag = soup.select_one('a:contains("发行信息")')
#a_tag = soup.select_one('a:-soup-contains("发行信息")') # <- depending what version of bs4 you have, the above may throw error since it's depricated
link = a_tag['href']
Ouput:输出:
print(link)
http://money.finance.sina.com.cn/bond/issue/sz149412.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.