[英]get parents element of a tag using python requests-HTML
嗨,有没有办法使用 requests-HTML 获取标签的所有父元素?
例如:
<!DOCTYPE html>
<html lang="en">
<body id="two">
<h1 class="text-primary">hello there</h1>
<p>one two tree<b>four</b>five</p>
</body>
</html>
我想获得b
标签的所有父级: [html, body, p]
或者对于h1
标签得到这个结果: [html, body]
使用出色的lxml
:
from lxml import etree
html = """<!DOCTYPE html>
<html lang="en">
<body id="two">
<h1 class="text-primary">hello there</h1>
<p>one two tree<b>four</b>five</p>
</body>
</html> """
tree = etree.HTML(html)
# We search the first <b> element
b_elt = tree.xpath('//b')[0]
print(b_elt.text)
# -> "four"
# Walking around ancestors of this <b> element
ancestors_tags = [elt.tag for elt in b_elt.iterancestors()]
print(ancestors_tags)
# -> [p, body, html]
您可以通过具有iterancestors()
的element
属性访问较低级别的lxml Element
以下是您可以这样做的方法:
from requests_html import HTML
html = """<!DOCTYPE html>
<html lang="en">
<body id="two">
<h1 class="text-primary">hello there</h1>
<p>one two tree<b>four</b>five</p>
</body>
</html>"""
html = HTML(html=html)
b = html.find('b', first=True)
parents = [a for a in b.element.iterancestors()]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.