简体   繁体   English

Javascript正则表达式将带有WordPress发布内容的字符串拆分为数组(按简码剪切)

[英]Javascript regex to split string with WordPress post content into array (cut at shortcodes)

If you start with a Javascript string that contains html, text and WordPress shortcodes like this example: 如果您以包含html,文本和WordPress短代码的Javascript字符串开头,例如以下示例:

<p>some random<br /> text goes here</p> <p>[foo params=&#8221;blue&#8221;]</p> <p>random text in html</p> <p>[bar params=&#8221;baz&#8221;]this has inner content[/bar]</p> <p>last bit of random text<br /> [foobar]this also has inner content [nestedbox params=&#8221;zoo&#8221;]this nest has inner content[/nestedbox][/foobar]</p> 

Is it possible to have a regex to change the string into the following: 是否可以使用正则表达式将字符串更改为以下内容:

array[
 '<p>some random<br /> text goes here</p><p>',
 '[foo params="blue"]',
 '</p> <p>random text in html</p><p>',
  array[
  '[bar params="baz"]',
  'this has inner content',
  '[/bar]'
 ], 
 '</p> <p>last bit of random text<br />'
 array[ 
  '[foobar]',
  'this also has inner content',
   array[
     '[nestedbox params="zoo"]',
       'this nest has inner content',
     '[/nestedbox ]'
    ], 
  '[/foobar]'
 ]
];

In short, the regex should only split at shortcodes inside the string, and depending on whether the shortcode is a self-closed one ( [foo ...] ) or a open/closed one ( [foobar....]...[/foobar] ) it needs to split recursively as shown above. 简而言之,正则表达式仅应在字符串内的短代码处进行拆分,并取决于该短代码是自封闭代码( [foo ...] )还是开放/封闭代码( [foobar....]...[/foobar] ),则需要如上所述进行递归拆分。

After experimenting for a while on https://regex101.com/ , I've only managed to get the various main parts to split (although not quite) with this and I'm a bit stuck: https://regex101.com/上进行了一段时间的试验后,我仅设法对此进行了拆分(尽管不是很充分),并且有些卡住了:

/(.*?)\[(.*?)\]/g

How can my current regex be tweaked to output the desired array? 我如何调整当前的正则表达式以输出所需的数组?

To do this solely with regular expressions is not possible, because of the nested array structure you need to get. 仅使用正则表达式无法做到这一点,因为您需要获取嵌套的数组结构。 And even if that were not needed, the JavaScript flavour of regexes do not have the power to match nested pairs of opening and closing tags. 即使没有必要,JavaScript形式的正则表达式也无法匹配嵌套的打开和关闭标签对。

So I would suggest to use a piece of JavaScript code for that. 因此,我建议为此使用一段JavaScript代码。 It might need a bit more testing, as I only applied it to your sample data with success: 它可能需要更多测试,因为我仅成功地将其应用于了示例数据:

 function nest(s) { var a = s.match(/\\[\\/?\\w.*?\\]|[^\\[]+/g), i = 0, closed; return (function recurse(endtag) { for (var res = [], v; v = a[i]; i++) { if (v == endtag) { res.push(v); return [res]; // return as nested } else if (v.match(/^\\[\\/\\w.*?\\]$/)) { i--; return res; // return as non-nested } else if (!v.match(/^\\[\\w.*?\\]$/) || !res.length) { // normal text or opening tag at start of // new part res.push(v); } else { // opening tag: recurse res = res.concat(recurse('[/' + v.match(/\\w+/)[0] + ']')); } } return res; })(); } // Sample data var s = '<p>some random<br /> text goes here</p> <p>[foo params=&#8221;blue&#8221;]</p> <p>random text in html</p> <p>[bar params=&#8221;baz&#8221;]this has inner content[/bar]</p> <p>last bit of random text<br /> [foobar]this also has inner content [nestedbox params=&#8221;zoo&#8221;]this nest has inner content[/nestedbox][/foobar]</p>'; // Call the function var a = nest(s); // Show output console.log(a); 
 .as-console-wrapper { max-height: 100% !important; top: 0; } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM