简体   繁体   English

正则表达式 - 如何匹配多个正确引用的子串

[英]Regex - how to match multiple properly quoted substrings

I am trying to use a Regex to extract quote-wrapped strings from within a (C#) string which is a comma-separated list of such strings. 我正在尝试使用正则表达式从(C#)字符串中提取引号包装的字符串,该字符串是这种字符串的逗号分隔列表。 I need to extract all properly quoted substrings, and ignore those that are missing a quote mark 我需要提取所有正确引用的子字符串,并忽略那些缺少引号的字符串

eg given this string 例如,给这个字符串

"animal,dog,cat","ecoli, verification,"streptococcus" “动物,狗,猫”,“大肠杆菌,验证,”链球菌“

I need to extract "animal,dog,cat" and "streptococcus". 我需要提取“动物,狗,猫”和“链球菌”。

I've tried various regex solutions in this forum but they all seem to find the first substring only, or incorrectly match "ecoli, verification," and ignore "streptococcus" 我在这个论坛上尝试了各种正则表达式解决方案,但他们似乎都只找到第一个子串,或者错误地匹配“大肠杆菌,验证”,忽略“链球菌”

Is this solvable? 这可以解决吗?

TIA TIA

Try this: 尝试这个:

string input = "\"animal,dog,cat\",\"ecoli, verification,\"streptococcus\"";
string pattern = "\"([^\"]+?[^,])\"";

var matches = Regex.Matches(input, pattern);

foreach (Match m in matches)
    Console.WriteLine(m.Groups[1].Value);

PS But I agree with the commentators: fix the source. PS但我赞同评论员:修复来源。

I suggest this: 我建议这个:

"(?>[^",]*(?>,[^",]+)*)"

Explanation: 说明:

"        # Match a starting quote
(?>      # Capture in an atomic group to avoid catastrophic backtracking:
 [^",]*  # - any number of characters except commas or quotes
 (?>     # - optionally followed by another (atomic) group:
  ,      #   - which starts with a comma
  [^",]+ #   - and contains at least one character besides comma or quotes.
 )*      # - (as said above, that group is optional but may occur many times)
)        # End of the outer atomic group
"        # Match a closing quote

Test it live on regex101.com . 在regex101.com上测试它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM