简体   繁体   English

Python Selenium 获取属性“href”错误

[英]Python Selenium get attribute 'href' error

I am trying to get href from the link, please find my codes:我正在尝试从链接中获取 href,请找到我的代码:

url ='http://money.finance.sina.com.cn/bond/notice/sz149412.html'
link = driver.find_element_by_xpath("//div[@class='blk01'])//ul//li[3]//a[contains(text(),'发行信息']").get_attribute('href')
print(link)

error错误

 invalid selector: Unable to locate an element with the xpath expression 
SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//div[@class='blk01'])//ul/li[3]//a[contains(text(),'发行信息']' is not a valid XPath expression.

Seems it is not a valid xpath, but I cannot figure out the error, any help will be appreciated!似乎它不是有效的 xpath,但我无法找出错误,任何帮助将不胜感激!

Thanks谢谢

try this instead:试试这个:

link = driver.find_element_by_xpath('//div[@class="blk01"]//ul//li[3]//a[contains(text(), "发行信息")]')
print(link.get_attribute("href"))


//a[contains(text(),'发行信息')]

Even this would work.即使这样也行。

print(link.get_attribute("href"))
# Importing necessary modules
from seleniumwire import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time

# WebDriver Chrome
driver = webdriver.Chrome(ChromeDriverManager().install())

# Target URL
url = 'http://money.finance.sina.com.cn/bond/notice/sz149412.html'
driver.get(url)
time.sleep(5)
link = driver.find_element_by_xpath('//*[@class="blue" and contains(text(),"发行信息")]').get_attribute('href')
print(link)
//div[@class='blk01'])//ul//li[3]//a[contains(text(),'发行信息']

does not seem to be a stable xpath and also you mess up with ' and " . This is the main problem.似乎不是一个稳定的 xpath 并且你搞砸了 ' 和 " 。这是主要问题。

Try this first:先试试这个:

find_element_by_xpath('//div[@class="blk01"])//ul//li[3]//a[contains(text(),"发行信息"]')

If it works, try just:如果有效,请尝试:

find_element_by_xpath('//a[contains(text(),"发行信息"]')

The goal is to make xpath as short as possible.目标是使xpath尽可能短。

Any particular reason to use Selenium here?在这里使用 Selenium 有什么特别的理由吗? It's present in the html source, so would be more efficient to use requests and beautifulsoup .它存在于 html 源中,因此使用requestsbeautifulsoup会更有效。

import requests
from bs4 import BeautifulSoup

url = 'http://money.finance.sina.com.cn/bond/notice/sz149412.html'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')


a_tag = soup.select_one('a:contains("发行信息")') 
#a_tag = soup.select_one('a:-soup-contains("发行信息")') # <- depending what version of bs4 you have, the above may throw error since it's depricated

link = a_tag['href']

Ouput:输出:

print(link)
http://money.finance.sina.com.cn/bond/issue/sz149412.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM