简体   繁体   中英

How to find out pairs of opening and closing html tags using javascript?

How to find out pairs of opening and closing html tags in javascript?

So I've an array of parsed html:

/// this is just markup only : any inner text is omitted for simplicity.


const parsedHtml = [
    '<div class="container">',
    '<div class="wrapper">',
    '<h3>',
    '</h3>',
    '<p>',
    '</p>',
   '<span>',
    '<a href="#">',
     '<img src="./img.svg">',
    '</span>',
    '</div>',
    '</div>'
]

// this whole array is a block of html code (nesting is in the above order)

So the idea here is to find opening and closing tag pairs;

(just the index.)

So that I can separate out blocks of code... like this:

<div class="container">
...
</div>


// or

<h3>
</h3>

//or 

<span>
...
</span>


Just need a way to find the index of closing tag that matches an opening tag. (think it as of opening blocks of code in vscode)

I could have done a check whether parsedHtml[i].startsWith('</') ... but still this does not guarantee an opening and a closing pair like this:

<div> ---> opening

</div> --->  closing

[pair]

NOTE

This is for finding nesting of tags so that I can indent the html likewise && show each of them as blocks. I don't wanna use packages like parse5, marked, prismjs, or highlight js.

My requirement is custom. -> (Just to find the opening and closing tag pairs, so that I can find how things are nested from the above parsed html array)

That's my approach:

var parsedHtml = [
   '<div class="container">',
   '<div class="wrapper">',
   '<h3>',
   '</h3>',
   '<p>',
   '</p>',
   '<span>',
   '<a href="#">',
   '<img src="./img.svg">',
   '</span>',
   '</div>',
   '</div>'
];
var getTag = (s) => s.replace(/<|>/gi, '').split(' ')[0];
var isCloseTag = (t) => t.includes('/');

var indices = parsedHtml.map(getTag).reduce(collectIndices, {});
console.log(JSON.stringify(indices)); // {"div":[[0,11],[1,10]],"h3":[[2,3]],"p":[[4,5]],"span":[[6,9]],"a":[[7]],"img":[[8]]}

function collectIndices(indices, tag, i) {
   const tagName = tag.replace('/', '');
   if (!(tagName in indices)) {
      indices[tagName] = [[i]];
      return indices;
   }
   if (isCloseTag(tag)) {
      indices[tagName].reverse().find((ins) => ins.length === 1).push(i);
      return indices;
   }
   indices[tagName].push([i]);
   return indices;
}

I found this answer here using js regex: https://www.octoparse.com/blog/using-regular-expression-to-match-html

All you have to do is put the tag in that you are looking for.

If I were looking for the a tag: /<a\s*.*>\s*.*<\/a>/gi

You can test it out with this regex tool: https://regexr.com/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM