How to find out pairs of opening and closing html tags in javascript?
So I've an array of parsed html:
/// this is just markup only : any inner text is omitted for simplicity.
const parsedHtml = [
'<div class="container">',
'<div class="wrapper">',
'<h3>',
'</h3>',
'<p>',
'</p>',
'<span>',
'<a href="#">',
'<img src="./img.svg">',
'</span>',
'</div>',
'</div>'
]
// this whole array is a block of html code (nesting is in the above order)
So the idea here is to find opening and closing tag pairs;
(just the index.)
So that I can separate out blocks of code... like this:
<div class="container">
...
</div>
// or
<h3>
</h3>
//or
<span>
...
</span>
Just need a way to find the index of closing tag that matches an opening tag. (think it as of opening blocks of code in vscode)
I could have done a check whether parsedHtml[i].startsWith('</')
... but still this does not guarantee an opening and a closing pair like this:
<div> ---> opening
</div> ---> closing
[pair]
NOTE
This is for finding nesting of tags so that I can indent the html likewise && show each of them as blocks. I don't wanna use packages like parse5, marked, prismjs, or highlight js.
My requirement is custom. -> (Just to find the opening and closing tag pairs, so that I can find how things are nested from the above parsed html array)
That's my approach:
var parsedHtml = [
'<div class="container">',
'<div class="wrapper">',
'<h3>',
'</h3>',
'<p>',
'</p>',
'<span>',
'<a href="#">',
'<img src="./img.svg">',
'</span>',
'</div>',
'</div>'
];
var getTag = (s) => s.replace(/<|>/gi, '').split(' ')[0];
var isCloseTag = (t) => t.includes('/');
var indices = parsedHtml.map(getTag).reduce(collectIndices, {});
console.log(JSON.stringify(indices)); // {"div":[[0,11],[1,10]],"h3":[[2,3]],"p":[[4,5]],"span":[[6,9]],"a":[[7]],"img":[[8]]}
function collectIndices(indices, tag, i) {
const tagName = tag.replace('/', '');
if (!(tagName in indices)) {
indices[tagName] = [[i]];
return indices;
}
if (isCloseTag(tag)) {
indices[tagName].reverse().find((ins) => ins.length === 1).push(i);
return indices;
}
indices[tagName].push([i]);
return indices;
}
I found this answer here using js regex: https://www.octoparse.com/blog/using-regular-expression-to-match-html
All you have to do is put the tag in that you are looking for.
If I were looking for the a tag: /<a\s*.*>\s*.*<\/a>/gi
You can test it out with this regex tool: https://regexr.com/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.