简体   繁体   中英

Getting element tag path in jSoup

Is there an efficient way to get an HTML element tag path of all the open but not closed tags with jSoup?

Eg if the HTML is

<!DOCTYPE html>
<html>
    <head>...</head>
    <body>
        <section id="secID">
            <div class="divClass">
                <section id="subSection">
                    <h3>Heading</h3>
                     <ul class="list">
                        <li>

when I get to li , I want its path to be html->body->section->div->section->ul

I believe a good way would be to check if the element you are on has children via children() method see here . If it has you put that element in a list and continue with it's first child and do the same and then the next one and so on. When there isn't any one left you have your list. It's a recursive idea, you will do the same with the second child and so on.

EDIT A bit of explanation

Let's say you are on html tag. Call children(). Take the list returned and begin. First element call children(). Returns list. First element call children etc. When you stop (no children) then you go up (father element) and continue with second child. It ends when you have visited all nodes of the initial list (from html element). It's a recursive idea so the efficiency is compromised, but it's solid.

<html>   <--- head , body
    <head>text</head> <---just text node so no elements
    <body>   <--- Second child of html. ul 
        <ul> <--- Empty no elements. go to father element.
        </ul>
    </body>
</html>

To get the list of 'open' elements, you can simply use the Element.parents() method. If you want to get the list starting with root element, you must reorder the returned list, but that should be trivial to achieve.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM