使用 python requests-HTML 获取标签的父元素

Question

嗨，有没有办法使用 requests-HTML 获取标签的所有父元素？

例如：

<!DOCTYPE html>
<html lang="en">
<body id="two">
    <h1 class="text-primary">hello there</h1>
    <p>one two tree<b>four</b>five</p>
</body>
</html>

我想获得b标签的所有父级： [html, body, p]

或者对于h1标签得到这个结果： [html, body]

Answer 1

使用出色的lxml ：

from lxml import etree
html = """<!DOCTYPE html>
<html lang="en">
<body id="two">
    <h1 class="text-primary">hello there</h1>
    <p>one two tree<b>four</b>five</p>
</body>
</html> """
tree = etree.HTML(html)
# We search the first <b> element
b_elt = tree.xpath('//b')[0]
print(b_elt.text)
# -> "four"
# Walking around ancestors of this <b> element
ancestors_tags = [elt.tag for elt in b_elt.iterancestors()]
print(ancestors_tags)
# -> [p, body, html]

Answer 2

您可以通过具有iterancestors()的element属性访问较低级别的lxml Element

以下是您可以这样做的方法：

from requests_html import HTML

html = """<!DOCTYPE html>
   <html lang="en">
   <body id="two">
       <h1 class="text-primary">hello there</h1>
       <p>one two tree<b>four</b>five</p>
    </body>
</html>"""
html = HTML(html=html)
b = html.find('b', first=True)
parents = [a for a in b.element.iterancestors()]

使用 python requests-HTML 获取标签的父元素

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-03-12 15:26:02

解决方案2
0 2019-11-17 22:00:01

使用 python requests-HTML 获取标签的父元素

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-03-12 15:26:02

解决方案2 0 2019-11-17 22:00:01

解决方案1
1 已采纳 2019-03-12 15:26:02

解决方案2
0 2019-11-17 22:00:01