简体   繁体   中英

get parents element of a tag using python requests-HTML

hi is There any way to get all The parent elements of a Tag using requests-HTML?

for example:

<!DOCTYPE html>
<html lang="en">
<body id="two">
    <h1 class="text-primary">hello there</h1>
    <p>one two tree<b>four</b>five</p>
</body>
</html> 

I want to get all parent of b tag: [html, body, p]

or for the h1 tag get this result: [html, body]

With the excellent lxml :

from lxml import etree
html = """<!DOCTYPE html>
<html lang="en">
<body id="two">
    <h1 class="text-primary">hello there</h1>
    <p>one two tree<b>four</b>five</p>
</body>
</html> """
tree = etree.HTML(html)
# We search the first <b> element
b_elt = tree.xpath('//b')[0]
print(b_elt.text)
# -> "four"
# Walking around ancestors of this <b> element
ancestors_tags = [elt.tag for elt in b_elt.iterancestors()]
print(ancestors_tags)
# -> [p, body, html]

You can access the lower level lxml Element via the element attribute which has an iterancestors()

Here is how you could do it:

from requests_html import HTML

html = """<!DOCTYPE html>
   <html lang="en">
   <body id="two">
       <h1 class="text-primary">hello there</h1>
       <p>one two tree<b>four</b>five</p>
    </body>
</html>"""
html = HTML(html=html)
b = html.find('b', first=True)
parents = [a for a in b.element.iterancestors()]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM