简体   繁体   English

按空格拆分字符串,保留带引号的段,允许转义引号

[英]Split a string by whitespace, keeping quoted segments, allowing escaped quotes

I currently have this regular expression to split strings by all whitespace, unless it's in a quoted segment: 我目前有这个正则表达式来按所有空格分割字符串,除非它在引用的段中:

keywords = 'pop rock "hard rock"';
keywords = keywords.match(/\w+|"[^"]+"/g);
console.log(keywords); // [pop, rock, "hard rock"]

However, I also want it to be possible to have quotes in keywords, like this: 但是,我也希望可以在关键字中使用引号,如下所示:

keywords = 'pop rock "hard rock" "\"dream\" pop"';

This should return 这应该回来了

[pop, rock, "hard rock", "\"dream\" pop"]

What's the easiest way to achieve this? 实现这一目标的最简单方法是什么?

You can change your regex to: 您可以将正则表达式更改为:

keywords = keywords.match(/\w+|"(?:\\"|[^"])+"/g);

Instead of [^"]+ you've got (?:\\\\"|[^"])+ which allows \\" or other character, but not an unescaped quote. 而不是[^"]+你有(?:\\\\"|[^"])+允许\\"或其他字符,但不是未转义的引用。

One important note is that if you want the string to include a literal slash, it should be: 一个重要的注意事项是,如果您希望字符串包含文字斜杠,则应该是:

keywords = 'pop rock "hard rock" "\\"dream\\" pop"'; //note the escaped slashes.

Also, there's a slight inconsistency between \\w+ and [^"]+ - for example, it will match the word "ab*d" , but not ab*d (without quotes). Consider using [^"\\s]+ instead, that will match non-spaces. 此外, \\w+[^"]+之间存在轻微的不一致 - 例如,它将匹配单词"ab*d" ,但不匹配ab*d (不带引号)。请考虑使用[^"\\s]+代替,这将匹配非空格。

ES6 solution supporting: ES6解决方案支持:

  • Split by space except for inside quotes 除内部引号外,按空格分割
  • Removing quotes but not for backslash escaped quotes 删除引号但不包括反斜杠转义引号
  • Escaped quote become quote 逃脱报价成为报价
  • Can put quotes anywhere 可以在任何地方加注

Code: 码:

keywords.match(/\\?.|^$/g).reduce((p, c) => {
        if(c === '"'){
            p.quote ^= 1;
        }else if(!p.quote && c === ' '){
            p.a.push('');
        }else{
            p.a[p.a.length-1] += c.replace(/\\(.)/,"$1");
        }
        return  p;
    }, {a: ['']}).a

Output: 输出:

[ 'pop', 'rock', 'hard rock', '"dream" pop' ]

If Kobi's answer works well for the example string, it doesn't when there are more than one successive escape characters (backslashes) between quotes as Tim Pietzcker noticed it in comments. 如果Kobi的答案适用于示例字符串,那么当Tim Pietzcker在评论中注意到时,引号之间不止有一个连续的转义字符(反斜杠) To handle these cases, the pattern can be written like this (for the match method) : 要处理这些情况,可以像这样编写模式(对于匹配方法)

(?=\S)[^"\s]*(?:"[^\\"]*(?:\\[\s\S][^\\"]*)*"[^"\s]*)*

demo 演示

Where (?=\\S) ensures there's at least one non-white-space character at the current position since the following, that describes all allowed sub-strings (including whitespaces between quotes) is totally optional. 其中(?=\\S)确保在当前位置至少有一个非空白字符,因为以下内容描述了所有允许的子字符串(包括引号之间的空格)是完全可选的。

Details: 细节:

(?=\S)   # followed by a non-whitespace
[^"\s]*  #"# zero or more characters that aren't a quote or a whitespace
(?: # when a quoted substring occurs:
    "       #"# opening quote
    [^\\"]* #"# zero or more characters that aren't a quote or a backslash
    (?: # when a backslash is encountered:
        \\ [\s\S] # an escaped character (including a quote or a backslash)
        [^\\"]* #"#
    )*
    "         #"# closing quote
    [^"\s]*   #"#
)*

I would like to point out I had the same regex as you, 我想指出我有和你一样的正则表达式,

/\w+|"[^"]+"/g

but it didnt worked on empty quoted string such as : 但它没有在空引用的字符串上工作,例如:

"" "hello" "" "hi"

so I had to change the + quantifier by *. 所以我不得不用+改变+量词。 this gave me : 这给了我:

str.match(/\w+|"[^"]*"/g);

Which is fine. 哪个好。

(ex: https://regex101.com/r/wm5puK/1 ) (例如: https//regex101.com/r/wm5puK/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM