简体   繁体   English

逗号分隔列表中的匹配项,不被单引号或双引号引起来

[英]Matching items in a comma-delimited list which aren't surrounded by single or double quotes

I'm wanting to match any instance of text in a comma-delimited list. 我想匹配以逗号分隔的列表中的任何文本实例。 For this, the following regular expression works great: 为此,以下正则表达式非常有用:

/[^,]+/g

( Regex101 demo ). Regex101演示 )。

The problem is that I'm wanting to ignore any commas which are contained within either single or double quotes and I'm unsure how to extend the above selector to allow me to do that. 问题是我想忽略单引号或双引号中包含的任何逗号,并且不确定如何扩展上述选择器以允许我这样做。

Here's an example string: 这是一个示例字符串:

abcd, efgh, ij"k,l", mnop, 'q,rs't

I'm wanting to either match the five chunks of text or match the four relevant commas (so I can retreive the data using split() instead of match() ): 我想匹配五个文本块匹配四个相关的逗号(因此我可以使用split()而不是match()检索数据):

  1. abcd
  2. efgh
  3. ij"k,l"
  4. mnop
  5. 'q,rs't

Or: 要么:

abcd, efgh, ij"k,l", mnop, 'q,rs't
    ^     ^        ^     ^

How can I do this? 我怎样才能做到这一点?


Three relevant questions exist, but none of them cater for both ' and " in JavaScript: 三个相关问题的存在,但他们没有满足这两个'"在JavaScript:

  1. Regex for splitting a string using space when not surrounded by single or double quotes - Java solution, doesn't appear to work in JavaScript. 正则表达式用于在没有单引号或双引号引起来时使用空格分割字符串 -Java解决方案,在JavaScript中似乎不起作用。
  2. A regex to match a comma that isn't surrounded by quotes - Only matches on " 正则表达式,用于匹配没有引号引起来的逗号 -仅匹配"
  3. Alternative to regex: match all instances not inside quotes - Only matches on " 正则表达式的替代方法:匹配所有实例,但不包括引号 -仅匹配"

Okay, so your matching groups can contain: 好的,因此您的匹配组可以包含:

  • Just letters 只是字母
  • A matching pair of " 一对配对的“
  • A matching pair of ' 配对的“

So this should work: 所以这应该工作:

/((?:[^,"']+|"[^"]*"|'[^']*')+)/g

RegEx101 Demo RegEx101演示

As a nice bonus, you can drop extra single-quotes inside the double-quotes, and vice versa. 作为一个不错的奖励,您可以在双引号中添加多余的单引号,反之亦然。 However, you'll probably need a state machine for adding escaped double-quotes inside double quoted strings (eg. "aa\\"aa"). 但是,您可能需要一个状态机来在双引号引起的字符串中添加转义的双引号(例如“ aa \\” aa”)。

Unfortunately it matches the initial space as well - you'll have to the trim the matches. 不幸的是,它也匹配初始空间-您必须修剪匹配项。

Using a double lookahead to ascertain matched comma is outside quotes: 在引号外使用双前瞻来确定匹配的逗号:

/(?=(([^"]*"){2})*[^"]*$)(?=(([^']*'){2})*[^']*$)\s*,\s*/g
  • (?=(([^"]*"){2})*[^"]*$) asserts that there are even number of double quotes ahead of matching comma. (?=(([^"]*"){2})*[^"]*$)断言在匹配的逗号前面有双引号的偶数。
  • (?=(([^']*"){2})*[^']*$) does the same assertion for single quote. (?=(([^']*"){2})*[^']*$)对单引号进行相同的声明。

PS: This doesn't handle case of unbalanced, nested or escaped quotes. PS:这不能处理不平衡,嵌套或转义引号的情况。

RegEx Demo 正则演示

Try this in JavaScript 在JavaScript中尝试

(?:(?:[^,"'\n]*(?:(?:"[^"\n]*")|(?:'[^'\n]*'))[^,"'\n]*)+)|[^,\n]+

Demo 演示版

Add group for more readable (remove ?< name > for Javascript) 添加组以提高可读性(对于Javascript,删除?< name >)

(?<has_quotes>(?:[^,"'\n]*(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+)|(?<simple>[^,\n]+)

Demo 演示版

Explanation: 说明:

(?<double_quotes>"[^"\\n]*") matches " Any inside but not " " = (1) (in double quote) (?<double_quotes>"[^"\\n]*")匹配"任何内部但不匹配” " = (1) (双引号)
(?<single_quotes>'[^'\\n]*') matches ' Any inside but not ' ' = (2) (in single quote) (?<single_quotes>'[^'\\n]*')匹配'任何内部但不匹配' ' = (2) (单引号)
(?:(?<double_quotes>"[^"\\n]*")|(?<single_quotes>'[^'\\n]*')) matches (1)or(2) = (3) (?:(?<double_quotes>"[^"\\n]*")|(?<single_quotes>'[^'\\n]*'))匹配(1)或(2)= (3)
[^,"'\\n]* matches any text but not "', = (w) [^,"'\\n]*匹配任何文本,但不匹配"', = (w)
(?:(?:(?<double_quotes>"[^"\\n]*")|(?<single_quotes>'[^'\\n]*'))[^,"'\\n]*) matches (3)(w) (?:(?:(?<double_quotes>"[^"\\n]*")|(?<single_quotes>'[^'\\n]*'))[^,"'\\n]*)匹配( 3)(宽)
(?:(?:(?<double_quotes>"[^"\\n]*")|(?<single_quotes>'[^'\\n]*'))[^,"'\\n]*)+ matches repeat (3)(w) = (3w+) (?:(?:(?<double_quotes>"[^"\\n]*")|(?<single_quotes>'[^'\\n]*'))[^,"'\\n]*)+匹配重复(3)(w)= (3w +)
(?<has_quotes>[^,"'\\n]*(?:(?:(?<double_quotes>"[^"\\n]*")|(?<single_quotes>'[^'\\n]*'))[^,"'\\n]*)+) matches (w)(3w+) = (4) (has quotes) (?<has_quotes>[^,"'\\n]*(?:(?:(?<double_quotes>"[^"\\n]*")|(?<single_quotes>'[^'\\n]*'))[^,"'\\n]*)+)匹配(w)(3w +)= (4) (带引号)
[^,\\n]+ matches other case (5) (simple) [^,\\n]+与其他情况(5)匹配(简单)
So in final we have (4)|(5) (has quote or simple) 所以最终我们有(4)|(5) (使用引号或简单形式)

Input 输入项

abcd,efgh, ijkl
abcd, efgh, ij"k,l", mnop, 'q,rs't
'q, rs't
"'q,rs't, ij"k, l""

Output: 输出:

MATCH 1
simple  [0-4]   `abcd`
MATCH 2
simple  [5-9]   `efgh`
MATCH 3
simple  [10-15] ` ijkl`
MATCH 4
simple  [16-20] `abcd`
MATCH 5
simple  [21-26] ` efgh`
MATCH 6
has_quotes  [27-35] ` ij"k,l"`
double_quotes   [30-35] `"k,l"`
MATCH 7
simple  [36-41] ` mnop`
MATCH 8
has_quotes  [42-50] ` 'q,rs't`
single_quotes   [43-49] `'q,rs'`
MATCH 9
has_quotes  [51-59] `'q, rs't`
single_quotes   [51-58] `'q, rs'`
MATCH 10
has_quotes  [60-74] `"'q,rs't, ij"k`
double_quotes   [60-73] `"'q,rs't, ij"`
MATCH 11
has_quotes  [75-79] ` l""`
double_quotes   [77-79] `""`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM