简体   繁体   English

使用正则表达式跳过括号 [] 拆分字符串

[英]Split string with regex skipping brackets []

I have a string and need to split it by whitespace but if there would be some words inside brackets I need to skip it.我有一个字符串,需要用空格分隔它,但如果括号内有一些单词,我需要跳过它。

For example,例如,

input: 'tree car[tesla BMW] cat color[yellow blue] dog'

output: ['tree', 'car[tesla BMW]', 'cat', 'color[yellow blue]', 'dog']

if I use simple .split(' ') it would go inside brackets and return an incorrect result.如果我使用简单.split(' ')它会在括号内 go 并返回不正确的结果。

Also, I've tried to write a regex, but unsuccessfully:(另外,我试图写一个正则表达式,但没有成功:(

My last regex looks like this .split(/(?:(?<=\[).+?(?=\])| )+/) and return ["tree", "car[", "]", "cat", "color[", "]", "dog"]我的最后一个正则表达式看起来像这样.split(/(?:(?<=\[).+?(?=\])| )+/)并返回["tree", "car[", "]", "cat", "color[", "]", "dog"]

Would be really grateful for any help非常感谢任何帮助

This is easier with match :使用match更容易:

 input = 'tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog' output = input.match(/[^[\]\s]+(\[.+?\])?/g) console.log(output)

With split you need a lookahead like this:使用split你需要这样的前瞻:

 input = 'tree car[tesla BMW] cat color[yellow blue] dog' output = input.split(/ (?.[^[]*\])/) console.log(output)

Both snippets only work if brackets are not nested, otherwise you'd need a parser rather than a regexp.这两个片段只有在括号没有嵌套时才有效,否则你需要一个解析器而不是一个正则表达式。

You could split on a space asserting to the right 1 or more non whitespace chars except for square brackets and optionally match from an opening till closing square bracket followed by a whitespace boundary at the right.您可以在一个空格上拆分,断言右侧有 1 个或多个非空白字符,方括号除外,并且可以选择从左方括号到右方括号匹配,然后是右侧的空白边界。

[ ](?=[^\][\s]+(?:\[[^\][]*])?(?!\S))

Explanation解释

  • [ ] Match a space (square brackets only for clarity) [ ]匹配一个空格(方括号只是为了清楚起见)
  • (?= Postive lookahead (?=正向前瞻
    • [^\][\s]+ Match 1+ times any char except ] [ or a whitespace char [^\][\s]+匹配除] [或空白字符之外的任何字符 1+ 次
    • (?:\[[^\][]*])? Optinally match from [...]可选地匹配[...]
    • (?!\S) A whitespace boundary to the right (?!\S)右边的空白边界
  • ) Close lookahead )关闭前瞻

Regex demo正则表达式演示

 const regex = / (?=[^\][\s]+(?:\[[^\][]*])?(?;\S))/g, [ "tree car[tesla BMW] cat color[yellow blue] dog": "tree car[tesla BMW] cat xml:cat xml,color[yellow blue] dog": "tree,test car[tesla BMW]", "tree car[tesla BMW] cat color yellow blue] dog". "tree car[tesla BMW] cat color[yellow blue dog" ].forEach(s => console.log(s;split(regex)));

Here is one regex find all option:这是一个正则表达式查找所有选项:

 var input = 'tree car[tesla BMW] cat color[yellow blue] dog'; var matches = input.match(/\[.*?\]|[ ]|\b\w+\b/g); var output = []; var idx1 = 0; var idx2 = 0; do { if (matches[idx1] === " ") { ++idx1; continue; } do { output[idx2] = output[idx2]? output[idx2] + matches[idx1]: matches[idx1]; ++idx1; } while(matches[idx1].= " " && idx1 < matches;length); ++idx2. } while(idx1 < matches;length). console;log(output);

For an explanation of the regex, we deal with the [...] terms which might have spaces by eagerly trying to match them first.为了解释正则表达式,我们通过急切地尝试首先匹配它们来处理可能有空格的[...]术语。 Next, we look for space separators, and finally we look for standalone words.接下来,我们寻找空格分隔符,最后我们寻找独立词。 Here is the regex:这是正则表达式:

\[.*?\]   find a [...] term
|         OR
[ ]       find a space
|         OR
\b\w+\b   find a word

This gives us the following intermediate array:这为我们提供了以下中间数组:

["tree", " ", "car", "[tesla BMW]", " ", "cat", " ", "color", "[yellow blue]", " ", "dog"]

Then we iterate and join together all non space entries in an output array, using the actual spaces to indicate where the real separations should be happening.然后我们迭代并将 output 数组中的所有非空格条目连接在一起,使用实际空格来指示真正的分隔应该发生的位置。

If you insist to use regex I recommend you to watch this page.如果您坚持使用正则表达式,我建议您观看页面。 The writer split by comma but I believe you smart enough to change it to space作者用逗号分隔,但我相信您足够聪明,可以将其更改为space

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM