XPath - 获取属性“href”

Question

how can I get the "href" attribute from html using XPath?如何使用 XPath 从 html 获取“href”属性？

<td>
    <a href="http://www.stackoverflow.com">
       <p>SERVER-45472</p>
    </a>
</td>

Why for me works only this command of " //a/@href "?为什么对我来说只能使用“ //a/@href ”这个命令？ Why can't I use this query - " /td//a/@href "?为什么我不能使用这个查询 - “ /td//a/@href ”？

What am I trying to do:我想做什么：

from lxml import html

tree = html.fromstring('<td><a href="https://jira.mongodb.org/browse/SERVER-45472"><p>SERVER-45472</p></a></td>')

a = tree.xpath('/td//a/@href')
print(a)

After running the script, an empty list is returned to me运行脚本后，一个空列表返回给我

Answer 1

Like this XPath:像这样的 XPath：

string(//a/@href)
http://www.stackoverflow.com

Your XPath partially works for me with xmllint :您的 XPath 部分适用于我的xmllint ：

xmllint --xpath '/td//a/@href' file
href="http://www.stackoverflow.com"

Which tools are you using, and what is your expected output, and what you get instead?您正在使用哪些工具，您期望的 output 是什么，而您得到的是什么？

Answer 2

Because a part of a HTML file is not a valid HTML document.因为 HTML 文件的一部分不是有效的 HTML 文件。

See:看：

$ python
Python 2.7.16 (default, Oct 10 2019, 22:02:15) 
[GCC 8.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import html
>>> tree = html.fromstring('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"><html>  <body><td><a href="https://jira.mongodb.org/browse/SERVER-45472"><p>SERVER-45472</p></a></td></body></html>')
>>> a = tree.xpath('/html/body/td//a/@href')
>>> print(a)
['https://jira.mongodb.org/browse/SERVER-45472']
>>>

XPath - 获取属性“href”

问题描述

2 个解决方案

解决方案1
1 2020-07-06 11:32:23

解决方案2
1 已采纳 2020-07-06 11:59:06

XPath - 获取属性“href”

问题描述

2 个解决方案

解决方案1 1 2020-07-06 11:32:23

解决方案2 1 已采纳 2020-07-06 11:59:06

解决方案1
1 2020-07-06 11:32:23

解决方案2
1 已采纳 2020-07-06 11:59:06