简体   繁体   中英

How I can implement an algorithm that loops through a tree HTML with Java?

I have to walk a tree that reaches me from a NodeList, I need an algorithm to traverse all nodes in order, most likely be in depth but not how to implement it. I think I need some recursion. Can anybody help?

The part of the code is: NodeList nodeLista = documento.getElementsByTagName("html");

for (int s = 0; s < nodeLista.getLength(); s++) {
    Node Raiz = nodeLista.item(s);

....

    for (int h = 0; h < nodeLista.getLength(); h++) {

    //Level of depth 1.
    Node Primer_Hijo = nodeLista.item(h); // In the first iteration for the HEAD will enter in the second iteration enter the BODY.

    //Level of depth 2.
    Element SegundoElemento = (Element) Primer_Hijo;
    NodeList ListadeNodos2 = SegundoElemento.getChildNodes();

.....

Recursive descent is exactly what you are looking for.

http://en.wikipedia.org/wiki/Recursive_descent_parser

For parsing html I have used Jerry in the past.

It bills itself as jquery for java and allows you to use css style selectors. I think there are now several libraries that implement css style selectors now.

It leads to more easily readable code though it might not fit your use case.

This is the pseudo code

    traverse_tree(node)   {
    childNodes = node.getChildNodes();
    if(chidNodes is empty){
      print valueOf(node);
      return;
    }
    for each childNode in childNodes{
     traverse_tree(childNode);
    }
}

Start traversal by calling traverse_tree(rootNode) //root is the tree root node.

Something like this:

public static void main(String[] args) {
    //get the nodeList
    //...
    for (int h = 0; h < nodeLista.getLength(); h++) {
        Node Primer_Hijo = nodeLista.item(h); 
        navegate(Primer_Hijo);
    }

    //or (better) the root node
    navegate(rootNode);
}

void navegate(Node node){
    //do something with node
    node.getAttributes();
    //...

    for(int i=0; i<node.getChildNodes().getLength(); i++)
        navegate(node.getChildNodes().item(i));
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM