[英]Parsing CSV files with multiple formats in C# using regex
I have been trying to pass a csv file with three fields. 我一直在尝试传递具有三个字段的csv文件。 The first two fields are simple and are easily extracted, the problem is with third field which is a string in nature hence can contain special characters including the ',' it self which is used to delimit the fields.
前两个字段很简单并且很容易提取,问题出在第三个字段,它本质上是一个字符串,因此可以包含特殊字符,包括“,”它用于分隔字段的自我。 I tried containing the string field between two ' " '(double quotes). But my requirement is that for simple string(without special characters) can exist without double quotes. I need to handle the next line in the string also. Below is a sample of a csv file.
我试过在两个'“(双引号)之间包含字符串字段。但是我的要求是,对于简单的字符串(没有特殊字符)可以不带双引号而存在。我还需要处理字符串中的下一行。 CSV文件的示例。
123,true,This is a memo
123,对,这是备忘录
234,false,"This is also a memo"
234,false,“这也是备忘录”
345,true,
345,真实,
456,true,Above me is a blank memo
456,是的,以上我是一个空白的备忘录
567,false,"This has a ,
567,false,“这有一个,
in it"
在里面”
678,true,This has a , in it <--- This record should be rejected
678,true,其中有一个<---此记录应被拒绝
789,false,""
789,假的 “”
890,true,Above me is also a valid blank memo
890,是的,以上我也是有效的空白备忘
I also found a good tool for testing the regex format string at http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx 我还在http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx找到了一个用于测试正则表达式格式字符串的好工具。
Till now I have used the following format string ^(""(?:[^""]|"""") ""|[^,] ),(""(?:[^""]|"""") ""|[^,] )$ 到目前为止,我已经使用以下格式字符串^(“”(?:[^:“] |”“”“) ”“ | [^,] ),(”“(?(?:[^”“] |”“ “”) “” | [^,] )$
The problem with this format string is that it does not handle multiple lines and does not reject a string with a starting double quote but missing ending double quote. 此格式字符串的问题是它不处理多行,并且不拒绝带有双引号但缺少结尾的双引号的字符串。
Thanks in advance. 提前致谢。
Thanks for the help guys but I needed to parse custom data in CSV and had to create my own custom parser. 感谢您的帮助,但是我需要解析CSV中的自定义数据,并且不得不创建自己的自定义解析器。 I am parsing each and every field separately and using regex string in small chunks.
我分别解析每个字段,并以小块形式使用正则表达式字符串。
There is no need to invent this wheel again. 无需再次发明该轮子。 I recommend using an existing CSV-parser, but there are many good alternatives.
我建议使用现有的CSV解析器,但是有很多不错的选择。
I have had great success with CSVReader , it's very fast and easy to use. CSVReader取得了巨大的成功,它非常快速且易于使用。 Basic usage:
基本用法:
using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
while (csv.ReadNextRecord())
{
for (int i = 0; i < fieldCount; i++)
Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));
Console.WriteLine();
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.