简体   繁体   中英

Parsing CSV files with multiple formats in C# using regex

I have been trying to pass a csv file with three fields. The first two fields are simple and are easily extracted, the problem is with third field which is a string in nature hence can contain special characters including the ',' it self which is used to delimit the fields. I tried containing the string field between two ' " '(double quotes). But my requirement is that for simple string(without special characters) can exist without double quotes. I need to handle the next line in the string also. Below is a sample of a csv file.

123,true,This is a memo

234,false,"This is also a memo"

345,true,

456,true,Above me is a blank memo

567,false,"This has a ,

in it"

678,true,This has a , in it <--- This record should be rejected

789,false,""

890,true,Above me is also a valid blank memo

I also found a good tool for testing the regex format string at http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

Till now I have used the following format string ^(""(?:[^""]|"""") ""|[^,] ),(""(?:[^""]|"""") ""|[^,] )$

The problem with this format string is that it does not handle multiple lines and does not reject a string with a starting double quote but missing ending double quote.

Thanks in advance.


Thanks for the help guys but I needed to parse custom data in CSV and had to create my own custom parser. I am parsing each and every field separately and using regex string in small chunks.

There is no need to invent this wheel again. I recommend using an existing CSV-parser, but there are many good alternatives.

I have had great success with CSVReader , it's very fast and easy to use. Basic usage:

using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
    int fieldCount = csv.FieldCount;
    string[] headers = csv.GetFieldHeaders();

    while (csv.ReadNextRecord())
    {
        for (int i = 0; i < fieldCount; i++)
            Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));

        Console.WriteLine();
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM