使用xpath / python選擇特定節點的父節點

Question

如何在此HTML代碼段中獲取a的href值？

我需要根據我標簽中的類來獲取它

<!--
<a href="https://link.com" target="_blank"><i class="foobar"></i>  </a>           
-->

我嘗試了這個，但沒有結果

foo_links = tree.xpath('//a[i/@class="foobar"]')

Answer 1

您的代碼確實對我<a> -它返回<a>的列表。 如果您想要href列表而不是元素本身，請添加/@href ：

hrefs = tree.xpath('//a[i/@class="foobar"]/@href')

您也可以先找到<i> ，然后使用/parent::* （或簡單地/.. ）返回到<a> 。

hrefs = tree.xpath('//a/i[@class="foobar"]/../@href')
#                     ^                    ^  ^
#                     |                    |  obtain the 'href'
#                     |                    |
#                     |                    get the parent of the <i>
#                     |
#                     find all <i class="foobar"> contained in an <a>.

如果所有這些都不起作用，則可能需要驗證文檔結構是否正確。

請注意，XPath不會窺視注釋 。 如果<a>確實位於注釋 ，則需要先手動將文檔提取出來。

hrefs = [href for comment in tree.xpath('//comment()') 
              # find all comments
              for href in lxml.html.fromstring(comment.text)
              # parse content of comment as a new HTML file
                              .xpath('//a[i/@class="foobar"]/@href')
                              # read those hrefs.
]

Answer 2

您應該注意目標元素是HTML 注釋。 您不能簡單地從帶有"//a"類的XPath 注釋中獲取<a> ，因為在這種情況下，它不是節點，而是簡單的字符串。

試試下面的代碼：

import re

foo_links = tree.xpath('//comment()') # get list of all comments on page
for link in foo_links:
    if '<i class="foobar">' in link.text:
        href = re.search('\w+://\w+.\w+', link.text).group(0) # get href value from required comment
        break

PS您可能需要使用更復雜的正則表達式來匹配鏈接URL

使用xpath / python選擇特定節點的父節點

問題描述

2 個解決方案

解決方案1
1 2017-04-13 15:17:44

解決方案2
0 2017-04-13 15:35:23

使用xpath / python選擇特定節點的父節點

問題描述

2 個解決方案

解決方案1 1 2017-04-13 15:17:44

解決方案2 0 2017-04-13 15:35:23

解決方案1
1 2017-04-13 15:17:44

解決方案2
0 2017-04-13 15:35:23