[英]Split a string by whitespace, keeping quoted segments, allowing escaped quotes
I currently have this regular expression to split strings by all whitespace, unless it's in a quoted segment: 我目前有这个正则表达式来按所有空格分割字符串,除非它在引用的段中:
keywords = 'pop rock "hard rock"';
keywords = keywords.match(/\w+|"[^"]+"/g);
console.log(keywords); // [pop, rock, "hard rock"]
However, I also want it to be possible to have quotes in keywords, like this: 但是,我也希望可以在关键字中使用引号,如下所示:
keywords = 'pop rock "hard rock" "\"dream\" pop"';
This should return 这应该回来了
[pop, rock, "hard rock", "\"dream\" pop"]
What's the easiest way to achieve this? 实现这一目标的最简单方法是什么?
You can change your regex to: 您可以将正则表达式更改为:
keywords = keywords.match(/\w+|"(?:\\"|[^"])+"/g);
Instead of [^"]+
you've got (?:\\\\"|[^"])+
which allows \\"
or other character, but not an unescaped quote. 而不是[^"]+
你有(?:\\\\"|[^"])+
允许\\"
或其他字符,但不是未转义的引用。
One important note is that if you want the string to include a literal slash, it should be: 一个重要的注意事项是,如果您希望字符串包含文字斜杠,则应该是:
keywords = 'pop rock "hard rock" "\\"dream\\" pop"'; //note the escaped slashes.
Also, there's a slight inconsistency between \\w+
and [^"]+
- for example, it will match the word "ab*d"
, but not ab*d
(without quotes). Consider using [^"\\s]+
instead, that will match non-spaces. 此外, \\w+
和[^"]+
之间存在轻微的不一致 - 例如,它将匹配单词"ab*d"
,但不匹配ab*d
(不带引号)。请考虑使用[^"\\s]+
代替,这将匹配非空格。
ES6 solution supporting: ES6解决方案支持:
Code: 码:
keywords.match(/\\?.|^$/g).reduce((p, c) => {
if(c === '"'){
p.quote ^= 1;
}else if(!p.quote && c === ' '){
p.a.push('');
}else{
p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
}
return p;
}, {a: ['']}).a
Output: 输出:
[ 'pop', 'rock', 'hard rock', '"dream" pop' ]
If Kobi's answer works well for the example string, it doesn't when there are more than one successive escape characters (backslashes) between quotes as Tim Pietzcker noticed it in comments. 如果Kobi的答案适用于示例字符串,那么当Tim Pietzcker在评论中注意到时,引号之间不止有一个连续的转义字符(反斜杠) 。 To handle these cases, the pattern can be written like this (for the match method) : 要处理这些情况,可以像这样编写模式(对于匹配方法) :
(?=\S)[^"\s]*(?:"[^\\"]*(?:\\[\s\S][^\\"]*)*"[^"\s]*)*
Where (?=\\S)
ensures there's at least one non-white-space character at the current position since the following, that describes all allowed sub-strings (including whitespaces between quotes) is totally optional. 其中(?=\\S)
确保在当前位置至少有一个非空白字符,因为以下内容描述了所有允许的子字符串(包括引号之间的空格)是完全可选的。
Details: 细节:
(?=\S) # followed by a non-whitespace
[^"\s]* #"# zero or more characters that aren't a quote or a whitespace
(?: # when a quoted substring occurs:
" #"# opening quote
[^\\"]* #"# zero or more characters that aren't a quote or a backslash
(?: # when a backslash is encountered:
\\ [\s\S] # an escaped character (including a quote or a backslash)
[^\\"]* #"#
)*
" #"# closing quote
[^"\s]* #"#
)*
I would like to point out I had the same regex as you, 我想指出我有和你一样的正则表达式,
/\w+|"[^"]+"/g
but it didnt worked on empty quoted string such as : 但它没有在空引用的字符串上工作,例如:
"" "hello" "" "hi"
so I had to change the + quantifier by *. 所以我不得不用+改变+量词。 this gave me : 这给了我:
str.match(/\w+|"[^"]*"/g);
Which is fine. 哪个好。
(ex: https://regex101.com/r/wm5puK/1 ) (例如: https : //regex101.com/r/wm5puK/1 )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.