I'm trying to create a regex pattern to match "faux" html tags for a small application i am building.
I have created the regex to capture found matches within {tag}brackets{/tag}
and output them into an array of objects like so:
{
{key : value},
{key : value}
}
Code with the current pattern:
let str = "{p}This is a paragraph{/p} {img}(path/to/image) {ul}{li}This is a list item{/li}{li}Another list item{/li}{/ul}"; let regex = /\\{(\\w+)}(?:\\()?([^\\{\\)]+)(?:\\{\\/1})?/g; let match; let matches = []; while (match = regex.exec(str)) { matches.push({ [match[1]]: match[2]}) } console.log(matches)
I have realized I need the pattern to capture nested groups as well, and put these into an array – so the result for the above string
would be:
[
{p : "This is a paragraph"},
{img : "path/to/image"},
{ul : ["This is a list item", "Another List item"]}
]
The idea here is to match each tag in order, so that the indexes of the array match the order they are found (ie first paragraph in the string above is array[0]
and so forth).
If anyone has a bit of input on how I could structure the pattern that would be greatly appreciated. I will not have more than 1 level deep nesting, if that makes any difference.
I am flexible to use a different markup for the ul
if this would help, however I cannot use square brackets [text]
due to conflicts with another function that generates the text I am trying to extract in this step.
Edit: An idea that hit me is to have a third capturing group to capture and push to the list-array, but I am unsure whether or not this would work in reality? I have not gotten it to work so far
JavaScript has no support for recursion within regular expressions, which would otherwise be a potential solution.
I would however go for a different solution:
You could rely on DOMParser
-- available in browsers, or if you are on Node, there is similar functionality available in several modules.
To use it, you need to have an XML formatted string, so unless you want to use <p>
style of tags, you'd first have to convert your string to that, making sure that content with <
would need to get <
instead.
Also the {img}
tag would need to get a closing tag instead of the parentheses. So a replacement is necessary for that particular case.
Once that is out of the way, it is quite straightforward to get a DOM from that XML, which might already be good enough for you to work with, but it can be simplified to your desired structure with a simple recursive function:
const str = "{p}This is a paragraph{/p} {img}(path/to/image) {ul}{li}This is a list item{/li}{li}Another list item{/li}{/ul}"; const xml = str.replace(/\\{img\\}\\((.*?)\\)/g, "{img}$1{/img}") .replace(/</g, "<") .replace(/\\{/g, "<").replace(/\\}/g, ">"); const parser = new DOMParser(); const dom = parser.parseFromString("<root>" + xml + "</root>", "application/xml").firstChild; const parse = dom => dom.nodeType === 3 ? dom.nodeValue.trim() : { [dom.nodeName]: dom.children.length ? Array.from(dom.childNodes, parse).filter(Boolean) : dom.firstChild.nodeValue }; const result = parse(dom).root; console.log(result);
The output is almost what you intended, except that that li
elements are also represented as { li: "...." }
objects.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.