简体   繁体   中英

Modify regex pattern to capture nested tags into an array of objects

I'm trying to create a regex pattern to match "faux" html tags for a small application i am building.

I have created the regex to capture found matches within {tag}brackets{/tag} and output them into an array of objects like so:

{
  {key : value}, 
  {key : value}
}

Code with the current pattern:

 let str = "{p}This is a paragraph{/p} {img}(path/to/image) {ul}{li}This is a list item{/li}{li}Another list item{/li}{/ul}"; let regex = /\\{(\\w+)}(?:\\()?([^\\{\\)]+)(?:\\{\\/1})?/g; let match; let matches = []; while (match = regex.exec(str)) { matches.push({ [match[1]]: match[2]}) } console.log(matches) 

Link to JSbin

I have realized I need the pattern to capture nested groups as well, and put these into an array – so the result for the above string would be:

[
  {p : "This is a paragraph"},
  {img : "path/to/image"},
  {ul : ["This is a list item", "Another List item"]}
]

The idea here is to match each tag in order, so that the indexes of the array match the order they are found (ie first paragraph in the string above is array[0] and so forth).

If anyone has a bit of input on how I could structure the pattern that would be greatly appreciated. I will not have more than 1 level deep nesting, if that makes any difference.

I am flexible to use a different markup for the ul if this would help, however I cannot use square brackets [text] due to conflicts with another function that generates the text I am trying to extract in this step.

Edit: An idea that hit me is to have a third capturing group to capture and push to the list-array, but I am unsure whether or not this would work in reality? I have not gotten it to work so far

JavaScript has no support for recursion within regular expressions, which would otherwise be a potential solution.

I would however go for a different solution:

You could rely on DOMParser -- available in browsers, or if you are on Node, there is similar functionality available in several modules.

To use it, you need to have an XML formatted string, so unless you want to use <p> style of tags, you'd first have to convert your string to that, making sure that content with < would need to get &lt; instead.

Also the {img} tag would need to get a closing tag instead of the parentheses. So a replacement is necessary for that particular case.

Once that is out of the way, it is quite straightforward to get a DOM from that XML, which might already be good enough for you to work with, but it can be simplified to your desired structure with a simple recursive function:

 const str = "{p}This is a paragraph{/p} {img}(path/to/image) {ul}{li}This is a list item{/li}{li}Another list item{/li}{/ul}"; const xml = str.replace(/\\{img\\}\\((.*?)\\)/g, "{img}$1{/img}") .replace(/</g, "&lt;") .replace(/\\{/g, "<").replace(/\\}/g, ">"); const parser = new DOMParser(); const dom = parser.parseFromString("<root>" + xml + "</root>", "application/xml").firstChild; const parse = dom => dom.nodeType === 3 ? dom.nodeValue.trim() : { [dom.nodeName]: dom.children.length ? Array.from(dom.childNodes, parse).filter(Boolean) : dom.firstChild.nodeValue }; const result = parse(dom).root; console.log(result); 

The output is almost what you intended, except that that li elements are also represented as { li: "...." } objects.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM