Closing tag events when iterating over DOM in JavaScript

Question

I am writing a Chrome Extension to convert HTML pages into a different format.

If I use document.getElementsByTagName("*") and iterate over that collection, I can see all the tags. However, it's a flat representation. I need to detect the opening and closing "events", like a SAX parser, so that my translated output maintains proper containment/nesting.

What is the right way to do this in JavaScript? It seems a little awkward to have to do this manually. Is there any other way to do this?

To illustrate what I mean...

   <html>
       <body>
           <h1>Header</h1>
           <div>
               <p>some text and a missing closing tag
               <p>some more text</p>
           </div>
           <p>some more dirty HTML
        </body>
    <html>

I need to get the events in this order:

    html open
    body open
    h1 open
    text
    h1 close
    div open
    p open
    text
    p close
    p open
    text
    p close
    div close
    p open
    text
    p close
    body close
    html close

I get the feeling it's up to me to track the SAX-parser-like events as part of my iteration. Are there any other options available to me? If not, can you point me to any sample code?

Thanks!

Answer 1

Just traverse each node and all the children of each node. When a level of children is exhausted, the tag is closed.

function parseChildren(node) {

    // if this a text node, it has no children or open/close tags
    if(node.nodeType == 3) {
        console.log("text");
        return;
    }

    console.log(node.tagName.toLowerCase() + " open");

    // parse the child nodes of this node
    for(var i = 0; i < node.childNodes.length; ++i) {
        parseChildren(node.childNodes[i]);
    }

    // all the children are used up, so this tag is done
    console.log(node.tagName.toLowerCase() + " close");
}

To traverse the whole page, just do parseChildren(document.documentFragment) . You can replace the console.log statements with whatever behavior you like.

Note that this code reports a lot of text nodes, because the whitespace between tags counts as a text node. To avoid this, just expand the text handling code:

    if(node.nodeType == 3) {
        // if this node is all whitespace, don't report it
        if(node.data.replace(/\s/g,'') == '') { return; }

        // otherwise, report it
        console.log("text");
        return;
    }

Answer 2

我不认为有一个工具，所以你应该写一些递归函数，你将get first child ， get next node ，以某种方式get parent ，等等。

Closing tag events when iterating over DOM in JavaScript

Question

2 answers

solution1
2 ACCPTED 2012-08-18 01:19:51

solution2
0 2012-08-18 01:08:59

Closing tag events when iterating over DOM in JavaScript

Question

2 answers

solution1 2 ACCPTED 2012-08-18 01:19:51

solution2 0 2012-08-18 01:08:59

solution1
2 ACCPTED 2012-08-18 01:19:51

solution2
0 2012-08-18 01:08:59