简体   繁体   English

修改正则表达式模式以将嵌套标记捕获到对象数组中

[英]Modify regex pattern to capture nested tags into an array of objects

I'm trying to create a regex pattern to match "faux" html tags for a small application i am building. 我正在尝试创建一个正则表达式模式,以匹配我正在构建的小应用程序的“虚假”html标签。

I have created the regex to capture found matches within {tag}brackets{/tag} and output them into an array of objects like so: 我创建了正则表达式以捕获{tag}brackets{/tag}找到的匹配项,并将它们输出到一个对象数组中,如下所示:

{
  {key : value}, 
  {key : value}
}

Code with the current pattern: 使用当前模式的代码:

 let str = "{p}This is a paragraph{/p} {img}(path/to/image) {ul}{li}This is a list item{/li}{li}Another list item{/li}{/ul}"; let regex = /\\{(\\w+)}(?:\\()?([^\\{\\)]+)(?:\\{\\/1})?/g; let match; let matches = []; while (match = regex.exec(str)) { matches.push({ [match[1]]: match[2]}) } console.log(matches) 

Link to JSbin 链接到JSbin

I have realized I need the pattern to capture nested groups as well, and put these into an array – so the result for the above string would be: 我已经意识到我需要模式来捕获嵌套组,并将它们放入一个数组中 - 所以上面string的结果将是:

[
  {p : "This is a paragraph"},
  {img : "path/to/image"},
  {ul : ["This is a list item", "Another List item"]}
]

The idea here is to match each tag in order, so that the indexes of the array match the order they are found (ie first paragraph in the string above is array[0] and so forth). 这里的想法是按顺序匹配每个标记,以便数组的索引匹配它们的顺序(即上面字符串中的第一段是array[0] ,依此类推)。

If anyone has a bit of input on how I could structure the pattern that would be greatly appreciated. 如果有人对我如何构建模式有一点意见,那将非常感激。 I will not have more than 1 level deep nesting, if that makes any difference. 如果这有任何区别,我将不会有超过1级深度嵌套。

I am flexible to use a different markup for the ul if this would help, however I cannot use square brackets [text] due to conflicts with another function that generates the text I am trying to extract in this step. 我可以灵活地为ul使用不同的标记,如果这会有所帮助,但是由于与生成我试图在此步骤中提取的文本的另一个函数的冲突,我不能使用方括号[text]

Edit: An idea that hit me is to have a third capturing group to capture and push to the list-array, but I am unsure whether or not this would work in reality? 编辑:一个打击我的想法是让第三个捕获组捕获并推送到列表阵列,但我不确定这是否会在现实中起作用? I have not gotten it to work so far 到目前为止我还没有工作

JavaScript has no support for recursion within regular expressions, which would otherwise be a potential solution. JavaScript不支持正则表达式中的递归,否则这将是一种潜在的解决方案。

I would however go for a different solution: 然而,我会寻求一个不同的解决方案:

You could rely on DOMParser -- available in browsers, or if you are on Node, there is similar functionality available in several modules. 您可以依赖DOMParser - 在浏览器中可用,或者如果您在Node上,则可以在多个模块中使用类似的功能。

To use it, you need to have an XML formatted string, so unless you want to use <p> style of tags, you'd first have to convert your string to that, making sure that content with < would need to get &lt; 要使用它,你需要有一个XML格式的字符串,所以除非你想使用<p>样式的标签,你首先必须将你的字符串转换为那个,确保带<内容需要得到&lt; instead. 代替。

Also the {img} tag would need to get a closing tag instead of the parentheses. 此外, {img}标记需要获得结束标记而不是括号。 So a replacement is necessary for that particular case. 因此,对于该特定情况需要替换。

Once that is out of the way, it is quite straightforward to get a DOM from that XML, which might already be good enough for you to work with, but it can be simplified to your desired structure with a simple recursive function: 一旦完成,就可以直接从该XML获取DOM,这可能已经足够您使用了,但是可以通过简单的递归函数将其简化为您想要的结构:

 const str = "{p}This is a paragraph{/p} {img}(path/to/image) {ul}{li}This is a list item{/li}{li}Another list item{/li}{/ul}"; const xml = str.replace(/\\{img\\}\\((.*?)\\)/g, "{img}$1{/img}") .replace(/</g, "&lt;") .replace(/\\{/g, "<").replace(/\\}/g, ">"); const parser = new DOMParser(); const dom = parser.parseFromString("<root>" + xml + "</root>", "application/xml").firstChild; const parse = dom => dom.nodeType === 3 ? dom.nodeValue.trim() : { [dom.nodeName]: dom.children.length ? Array.from(dom.childNodes, parse).filter(Boolean) : dom.firstChild.nodeValue }; const result = parse(dom).root; console.log(result); 

The output is almost what you intended, except that that li elements are also represented as { li: "...." } objects. 输出几乎与您的意图相同,只是li元素也表示为{ li: "...." }对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM