[英]Select parent of specific node using xpath/python
How do I get the href value for the a in this snippet of html? 如何在此HTML代码段中获取a的href值?
I need to get it based on that class in i tag 我需要根据我标签中的类来获取它
<!--
<a href="https://link.com" target="_blank"><i class="foobar"></i> </a>
-->
I tried this, but am getting no results 我尝试了这个,但没有结果
foo_links = tree.xpath('//a[i/@class="foobar"]')
Your code does work for me — it returns a list of <a>
. 您的代码确实对我<a>
-它返回<a>
的列表。 If you want a list of href
s not the element itself, add /@href
: 如果您想要href
列表而不是元素本身,请添加/@href
:
hrefs = tree.xpath('//a[i/@class="foobar"]/@href')
You could also first find the <i>
s, then use /parent::*
(or simply /..
) to get back to the <a>
s. 您也可以先找到<i>
,然后使用/parent::*
(或简单地/..
)返回到<a>
。
hrefs = tree.xpath('//a/i[@class="foobar"]/../@href')
# ^ ^ ^
# | | obtain the 'href'
# | |
# | get the parent of the <i>
# |
# find all <i class="foobar"> contained in an <a>.
If all of these don't work, you may want to verify if the structure of the document is correct. 如果所有这些都不起作用,则可能需要验证文档结构是否正确。
Note that XPath won't peek inside comments <!-- -->
. 请注意,XPath不会窥视注释<!-- -->
。 If the <a>
is indeed inside the comments <!-- -->
, you need to manually extract the document out first. 如果<a>
确实位于注释<!-- -->
,则需要先手动将文档提取出来。
hrefs = [href for comment in tree.xpath('//comment()')
# find all comments
for href in lxml.html.fromstring(comment.text)
# parse content of comment as a new HTML file
.xpath('//a[i/@class="foobar"]/@href')
# read those hrefs.
]
You should note that target element is HTML
comment . 您应该注意目标元素是HTML
注释 。 You cannot simply get <a>
from comment with XPath
like "//a"
as in this case it's not a node, but simple string. 您不能简单地从带有"//a"
类的XPath
注释中获取<a>
,因为在这种情况下,它不是节点,而是简单的字符串。
Try below code: 试试下面的代码:
import re
foo_links = tree.xpath('//comment()') # get list of all comments on page
for link in foo_links:
if '<i class="foobar">' in link.text:
href = re.search('\w+://\w+.\w+', link.text).group(0) # get href value from required comment
break
PS You might need to use more complex regular expression to match link URL
PS您可能需要使用更复杂的正则表达式来匹配链接URL
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.