简体   繁体   中英

Regex exclude pattern for string to array

For a project, i need to separate string values into an list of strings. The strings are build as following:

string unprocessed = "\"foo,bar\",\"foobar\",\"shizzle ma nizzle\"";

i want to get int into an array like the following:

string[] processed = [] { "\"foo,bar\"", "\"foobar\"", "\"shizzle ma nizzle\""};

For this, im using a regex match system, that separates the code on the "," character combination. The code i have so far is as following:

Regex reg = new Regex(@"((?!(,""|"",)).)+");
string regmatch = "\"\"wubba,lubba\",\"dup dub\"\"";
var matches =  reg.Matches(regmatch);

Assert.AreEqual(2, matches.Count);
Assert.AreEqual("\"dup dub\"\"", matches[1].Value); // passes
Assert.AreEqual("\"\"wubba,lubba\"", matches[0].Value); // fails because value = \"\"wubba,lubba

So far im getting one slight error, as seen in the example code. Right now i'm thinging I'm almost there. Can someone help me solve this regex issue? or is there a better way to do this?

Just capture sequences which have quotes around and non-quote symbols inside:

var processed = Regex.Matches(unprocessed, "\"[^\"]+\"")
                     .Cast<Match>()
                     .Select(m => m.Value)
                     .ToArray();

Output:

[
  "\"foo,bar\"",
  "\"foobar\"",
  "\"shizzle ma nizzle\""
]

If simple enumerable is good for you, you can use nice simple query:

var processed = from Match m in Regex.Matches(unprocessed, "\"[^\"]+\"")
                select m.Value;

Since your requirement also mandates that you capture multiple redundant quotation marks in any given substring (why???) a tweak of Sergey Berezovskly's pattern should yield the desired results:

var processed = Regex.Matches(unprocessed, "\"+[^\"]+\"+")
                     .Cast<Match>()
                     .Select(m => m.Value)
                     .ToList();

Parsing CSV with Regex is the second worst method that I know of. For example a"b,c" in CSV is "a""b,c""" which can't be reliably parsed with RegEx and will leave the escaped "" in the result.

I would recommend looking for a dedicated CSV parser like CsvReader , FileHelpers , LINQtoCSV , etc. If by any chance external library is not an option : Microsoft.VisualBasic.FileIO.TextFieldParser

Parsing CSV files in C#, with header

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM