使用xpath / python选择特定节点的父节点

Question

How do I get the href value for the a in this snippet of html? 如何在此HTML代码段中获取a的href值？

I need to get it based on that class in i tag 我需要根据我标签中的类来获取它

<!--
<a href="https://link.com" target="_blank"><i class="foobar"></i>  </a>           
-->

I tried this, but am getting no results 我尝试了这个，但没有结果

foo_links = tree.xpath('//a[i/@class="foobar"]')

Answer 1

Your code does work for me — it returns a list of <a> . 您的代码确实对我<a> -它返回<a>的列表。 If you want a list of href s not the element itself, add /@href : 如果您想要href列表而不是元素本身，请添加/@href ：

hrefs = tree.xpath('//a[i/@class="foobar"]/@href')

You could also first find the <i> s, then use /parent::* (or simply /.. ) to get back to the <a> s. 您也可以先找到<i> ，然后使用/parent::* （或简单地/.. ）返回到<a> 。

hrefs = tree.xpath('//a/i[@class="foobar"]/../@href')
#                     ^                    ^  ^
#                     |                    |  obtain the 'href'
#                     |                    |
#                     |                    get the parent of the <i>
#                     |
#                     find all <i class="foobar"> contained in an <a>.

If all of these don't work, you may want to verify if the structure of the document is correct. 如果所有这些都不起作用，则可能需要验证文档结构是否正确。

Note that XPath won't peek inside comments  . 请注意，XPath不会窥视注释 。 If the <a> is indeed inside the comments  , you need to manually extract the document out first. 如果<a>确实位于注释 ，则需要先手动将文档提取出来。

hrefs = [href for comment in tree.xpath('//comment()') 
              # find all comments
              for href in lxml.html.fromstring(comment.text)
              # parse content of comment as a new HTML file
                              .xpath('//a[i/@class="foobar"]/@href')
                              # read those hrefs.
]

Answer 2

You should note that target element is HTML comment . 您应该注意目标元素是HTML 注释。 You cannot simply get <a> from comment with XPath like "//a" as in this case it's not a node, but simple string. 您不能简单地从带有"//a"类的XPath 注释中获取<a> ，因为在这种情况下，它不是节点，而是简单的字符串。

Try below code: 试试下面的代码：

import re

foo_links = tree.xpath('//comment()') # get list of all comments on page
for link in foo_links:
    if '<i class="foobar">' in link.text:
        href = re.search('\w+://\w+.\w+', link.text).group(0) # get href value from required comment
        break

PS You might need to use more complex regular expression to match link URL PS您可能需要使用更复杂的正则表达式来匹配链接URL

使用xpath / python选择特定节点的父节点

问题描述

2 个解决方案

解决方案1
1 2017-04-13 15:17:44

解决方案2
0 2017-04-13 15:35:23

使用xpath / python选择特定节点的父节点

问题描述

2 个解决方案

解决方案1 1 2017-04-13 15:17:44

解决方案2 0 2017-04-13 15:35:23

解决方案1
1 2017-04-13 15:17:44

解决方案2
0 2017-04-13 15:35:23