简体   繁体   中英

Regex - how to match multiple properly quoted substrings

I am trying to use a Regex to extract quote-wrapped strings from within a (C#) string which is a comma-separated list of such strings. I need to extract all properly quoted substrings, and ignore those that are missing a quote mark

eg given this string

"animal,dog,cat","ecoli, verification,"streptococcus"

I need to extract "animal,dog,cat" and "streptococcus".

I've tried various regex solutions in this forum but they all seem to find the first substring only, or incorrectly match "ecoli, verification," and ignore "streptococcus"

Is this solvable?

TIA

Try this:

string input = "\"animal,dog,cat\",\"ecoli, verification,\"streptococcus\"";
string pattern = "\"([^\"]+?[^,])\"";

var matches = Regex.Matches(input, pattern);

foreach (Match m in matches)
    Console.WriteLine(m.Groups[1].Value);

PS But I agree with the commentators: fix the source.

I suggest this:

"(?>[^",]*(?>,[^",]+)*)"

Explanation:

"        # Match a starting quote
(?>      # Capture in an atomic group to avoid catastrophic backtracking:
 [^",]*  # - any number of characters except commas or quotes
 (?>     # - optionally followed by another (atomic) group:
  ,      #   - which starts with a comma
  [^",]+ #   - and contains at least one character besides comma or quotes.
 )*      # - (as said above, that group is optional but may occur many times)
)        # End of the outer atomic group
"        # Match a closing quote

Test it live on regex101.com .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM