简体   繁体   中英

Javascript HTML string into array of tags and inner content

I am trying to match on HTML tags and their inner content and put each match into an array (both the tag and its inner contents).

I was able to match the tags themself and put those into an array, but I'm not sure how to get the inner contents of the tag as well.

// Example String
let str = "<p><b>Label:</b>Value<p></p><p><b>New Line Label:</b>Value 2</p></p>";
console.log(str.match(/\<.*?\>/gi)) // Output ["<p>", "<b>", "</b>", "<p>", "</p>", "<p>", "<b>", "</b>", "</p>", "</p>"]


// Expected Output
["<p>", "<b>", "Label:", "</b>", "Value", "<p>", "</p>", "<p>", "<b>", "New Line Label:", "</b>", "Value 2", "</p>", "</p>"]

Can this be handled in a single regex match, or do I need to match and then look back to the closing previous tag to get the inner content?

You could use DOMParser API and then keep iterating through childrens of each node

let doc = new DOMParser().parseFromString('<p><b>Label:</b>Value<p></p><p><b>New Line Label:</b>Value 2</p></p>', 'text/html')

console.log(doc.children) // DOM nodes 

with this you'll have constructed a complete DOM from string you have then apply any function you'd like on it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM