[英]Split string with regex skipping brackets []
I have a string and need to split it by whitespace but if there would be some words inside brackets I need to skip it.我有一个字符串,需要用空格分隔它,但如果括号内有一些单词,我需要跳过它。
For example,例如,
input: 'tree car[tesla BMW] cat color[yellow blue] dog'
output: ['tree', 'car[tesla BMW]', 'cat', 'color[yellow blue]', 'dog']
if I use simple .split(' ')
it would go inside brackets and return an incorrect result.如果我使用简单
.split(' ')
它会在括号内 go 并返回不正确的结果。
Also, I've tried to write a regex, but unsuccessfully:(另外,我试图写一个正则表达式,但没有成功:(
My last regex looks like this .split(/(?:(?<=\[).+?(?=\])| )+/)
and return ["tree", "car[", "]", "cat", "color[", "]", "dog"]
我的最后一个正则表达式看起来像这样
.split(/(?:(?<=\[).+?(?=\])| )+/)
并返回["tree", "car[", "]", "cat", "color[", "]", "dog"]
Would be really grateful for any help非常感谢任何帮助
This is easier with match
:使用
match
更容易:
input = 'tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog' output = input.match(/[^[\]\s]+(\[.+?\])?/g) console.log(output)
With split
you need a lookahead like this:使用
split
你需要这样的前瞻:
input = 'tree car[tesla BMW] cat color[yellow blue] dog' output = input.split(/ (?.[^[]*\])/) console.log(output)
Both snippets only work if brackets are not nested, otherwise you'd need a parser rather than a regexp.这两个片段只有在括号没有嵌套时才有效,否则你需要一个解析器而不是一个正则表达式。
You could split on a space asserting to the right 1 or more non whitespace chars except for square brackets and optionally match from an opening till closing square bracket followed by a whitespace boundary at the right.您可以在一个空格上拆分,断言右侧有 1 个或多个非空白字符,方括号除外,并且可以选择从左方括号到右方括号匹配,然后是右侧的空白边界。
[ ](?=[^\][\s]+(?:\[[^\][]*])?(?!\S))
Explanation解释
[ ]
Match a space (square brackets only for clarity) [ ]
匹配一个空格(方括号只是为了清楚起见)(?=
Postive lookahead (?=
正向前瞻
[^\][\s]+
Match 1+ times any char except ]
[
or a whitespace char [^\][\s]+
匹配除]
[
或空白字符之外的任何字符 1+ 次(?:\[[^\][]*])?
Optinally match from [...]
[...]
(?!\S)
A whitespace boundary to the right (?!\S)
右边的空白边界)
Close lookahead )
关闭前瞻 const regex = / (?=[^\][\s]+(?:\[[^\][]*])?(?;\S))/g, [ "tree car[tesla BMW] cat color[yellow blue] dog": "tree car[tesla BMW] cat xml:cat xml,color[yellow blue] dog": "tree,test car[tesla BMW]", "tree car[tesla BMW] cat color yellow blue] dog". "tree car[tesla BMW] cat color[yellow blue dog" ].forEach(s => console.log(s;split(regex)));
Here is one regex find all option:这是一个正则表达式查找所有选项:
var input = 'tree car[tesla BMW] cat color[yellow blue] dog'; var matches = input.match(/\[.*?\]|[ ]|\b\w+\b/g); var output = []; var idx1 = 0; var idx2 = 0; do { if (matches[idx1] === " ") { ++idx1; continue; } do { output[idx2] = output[idx2]? output[idx2] + matches[idx1]: matches[idx1]; ++idx1; } while(matches[idx1].= " " && idx1 < matches;length); ++idx2. } while(idx1 < matches;length). console;log(output);
For an explanation of the regex, we deal with the [...]
terms which might have spaces by eagerly trying to match them first.为了解释正则表达式,我们通过急切地尝试首先匹配它们来处理可能有空格的
[...]
术语。 Next, we look for space separators, and finally we look for standalone words.接下来,我们寻找空格分隔符,最后我们寻找独立词。 Here is the regex:
这是正则表达式:
\[.*?\] find a [...] term
| OR
[ ] find a space
| OR
\b\w+\b find a word
This gives us the following intermediate array:这为我们提供了以下中间数组:
["tree", " ", "car", "[tesla BMW]", " ", "cat", " ", "color", "[yellow blue]", " ", "dog"]
Then we iterate and join together all non space entries in an output array, using the actual spaces to indicate where the real separations should be happening.然后我们迭代并将 output 数组中的所有非空格条目连接在一起,使用实际空格来指示真正的分隔应该发生的位置。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.