简体   繁体   中英

Closing tag events when iterating over DOM in JavaScript

I am writing a Chrome Extension to convert HTML pages into a different format.

If I use document.getElementsByTagName("*") and iterate over that collection, I can see all the tags. However, it's a flat representation. I need to detect the opening and closing "events", like a SAX parser, so that my translated output maintains proper containment/nesting.

What is the right way to do this in JavaScript? It seems a little awkward to have to do this manually. Is there any other way to do this?

To illustrate what I mean...

   <html>
       <body>
           <h1>Header</h1>
           <div>
               <p>some text and a missing closing tag
               <p>some more text</p>
           </div>
           <p>some more dirty HTML
        </body>
    <html>

I need to get the events in this order:

    html open
    body open
    h1 open
    text
    h1 close
    div open
    p open
    text
    p close
    p open
    text
    p close
    div close
    p open
    text
    p close
    body close
    html close

I get the feeling it's up to me to track the SAX-parser-like events as part of my iteration. Are there any other options available to me? If not, can you point me to any sample code?

Thanks!

Just traverse each node and all the children of each node. When a level of children is exhausted, the tag is closed.

function parseChildren(node) {

    // if this a text node, it has no children or open/close tags
    if(node.nodeType == 3) {
        console.log("text");
        return;
    }

    console.log(node.tagName.toLowerCase() + " open");

    // parse the child nodes of this node
    for(var i = 0; i < node.childNodes.length; ++i) {
        parseChildren(node.childNodes[i]);
    }

    // all the children are used up, so this tag is done
    console.log(node.tagName.toLowerCase() + " close");
}

To traverse the whole page, just do parseChildren(document.documentFragment) . You can replace the console.log statements with whatever behavior you like.

Note that this code reports a lot of text nodes, because the whitespace between tags counts as a text node. To avoid this, just expand the text handling code:

    if(node.nodeType == 3) {
        // if this node is all whitespace, don't report it
        if(node.data.replace(/\s/g,'') == '') { return; }

        // otherwise, report it
        console.log("text");
        return;
    }

我不认为有一个工具,所以你应该写一些递归函数,你将get first childget next node ,以某种方式get parent ,等等。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM