简体   繁体   English

使用正则表达式以C#解析具有多种格式的CSV文件

[英]Parsing CSV files with multiple formats in C# using regex

I have been trying to pass a csv file with three fields. 我一直在尝试传递具有三个字段的csv文件。 The first two fields are simple and are easily extracted, the problem is with third field which is a string in nature hence can contain special characters including the ',' it self which is used to delimit the fields. 前两个字段很简单并且很容易提取,问题出在第三个字段,它本质上是一个字符串,因此可以包含特殊字符,包括“,”它用于分隔字段的自我。 I tried containing the string field between two ' " '(double quotes). But my requirement is that for simple string(without special characters) can exist without double quotes. I need to handle the next line in the string also. Below is a sample of a csv file. 我试过在两个'“(双引号)之间包含字符串字段。但是我的要求是,对于简单的字符串(没有特殊字符)可以不带双引号而存在。我还需要处理字符串中的下一行。 CSV文件的示例。

123,true,This is a memo 123,对,这是备忘录

234,false,"This is also a memo" 234,false,“这也是备忘录”

345,true, 345,真实,

456,true,Above me is a blank memo 456,是的,以上我是一个空白的备忘录

567,false,"This has a , 567,false,“这有一个,

in it" 在里面”

678,true,This has a , in it <--- This record should be rejected 678,true,其中有一个<---此记录应被拒绝

789,false,"" 789,假的 “”

890,true,Above me is also a valid blank memo 890,是的,以上我也是有效的空白备忘

I also found a good tool for testing the regex format string at http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx 我还在http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx找到了一个用于测试正则表达式格式字符串的好工具。

Till now I have used the following format string ^(""(?:[^""]|"""") ""|[^,] ),(""(?:[^""]|"""") ""|[^,] )$ 到目前为止,我已经使用以下格式字符串^(“”(?:[^:“] |”“”“) ”“ | [^,] ),(”“(?(?:[^”“] |”“ “”) “” | [^,] )$

The problem with this format string is that it does not handle multiple lines and does not reject a string with a starting double quote but missing ending double quote. 此格式字符串的问题是它不处理多行,并且不拒绝带有双引号但缺少结尾的双引号的字符串。

Thanks in advance. 提前致谢。


Thanks for the help guys but I needed to parse custom data in CSV and had to create my own custom parser. 感谢您的帮助,但是我需要解析CSV中的自定义数据,并且不得不创建自己的自定义解析器。 I am parsing each and every field separately and using regex string in small chunks. 我分别解析每个字段,并以小块形式使用正则表达式字符串。

There is no need to invent this wheel again. 无需再次发明该轮子。 I recommend using an existing CSV-parser, but there are many good alternatives. 我建议使用现有的CSV解析器,但是有很多不错的选择。

I have had great success with CSVReader , it's very fast and easy to use. CSVReader取得了巨大的成功,它非常快速且易于使用。 Basic usage: 基本用法:

using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
    int fieldCount = csv.FieldCount;
    string[] headers = csv.GetFieldHeaders();

    while (csv.ReadNextRecord())
    {
        for (int i = 0; i < fieldCount; i++)
            Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));

        Console.WriteLine();
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM