[英]CSV Parsing error using regex object C#
I am using ParseCSV function to parse a CSV file in C#. 我正在使用ParseCSV函数来解析C#中的CSV文件。
The last column in a row of CSV file contains: NM 120922C00002500(lots of spaces after this) CSV文件行的最后一列包含: NM 120922C00002500(此后有很多空格)
In ParseCSV function i am passing an inputstring, as a result of reading the CSV file. 在ParseCSV函数中,由于读取CSV文件,我传递了一个输入字符串。
A part of the inputstring is: 输入字符串的一部分是:
"1",000066,"07/30/2012","53193315D4","B ","99AAXXPB0"," "," "," ","CALL NM 09/22/12 00002.500 ","MG",100.00,1.050000,310,32550.00,25530.70,360,37800.00,30477.78,"C",2.50000,09/22/2012,"NM"," NM 120922C00002500 ". “ 1”,000066,“ 07/30/2012”,“ 53193315D4”,“ B”,“ 99AAXXPB0”,“”,“”,“”,“ CALL NM 09/22/12 00002.500”,“ MG”, 100.00,1.050000,310,32550.00,25530.70,360,37800.00,30477.78,“ C”,2.50000,09 / 22/2012,“ NM”,“ NM 120922C00002500 ”。
in the CSVParse function, am doing the following: 在CSVParse函数中,正在执行以下操作:
string csvParsingRegularExpressionOld = Prana.Global.ConfigurationHelper.Instance.GetAppSettingValueByKey("CSVParsingRegularExpression");
string csvParsingRegularExpression = csvParsingRegularExpressionOld.Replace("\\\\", "\\");
In csvParsingRegularExpression value comes out as: 在csvParsingRegularExpression中,值显示为:
((?<field>[^",\r\n]*)|"(?<field>([^"]|"")*)")(,|(?<rowbreak>\r\n|\n|$))
The I follow up with 我跟进
Regex re = new Regex(csvParsingRegularExpression);
MatchCollection mc = re.Matches(inputString);
foreach (Match m in mc)
{
field = m.Result("${field}").Replace("\"\"", "\"");
}
But here field contains empty string when it comes to the last value "NM 120922C00002500". 但是,当最后一个值是“ NM 120922C00002500”时,此处的字段包含空字符串。 What may be the possible solution for this problem? 这个问题可能有什么解决方案?
I dont know if there's a problem with the CSV file or with the regex method " Matches ". 我不知道CSV文件或正则表达式方法“ Matches ”是否存在问题。
Don't use Regex to read CSV. 不要使用正则表达式读取CSV。
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
You're not matching the last group because it ends with a period outside the quotes. 您没有匹配最后一组,因为它以引号外的句号结尾。 If you add the period to the terminating group of your regex it works: 如果将句点添加到正则表达式的终止组中,它将起作用:
(\"?(?<field>[^",\r|\n]*)\"?\,?)*\.?(?<rowbreak>[\r|\n]*)
Although as other comments have pointed out, it's not a great idea to roll your own parser if the data is really valid CSV (I did't bother to check whether the given sample matches the spec ). 尽管正如其他评论所指出的那样,如果数据确实是有效的CSV,则滚动自己的解析器并不是一个好主意(我不费力去检查给定的样本是否与spec匹配)。 There are plenty of parsers available and you're likely to miss some edge cases. 有很多可用的解析器,您可能会错过一些边缘情况。
If you don't absolutely want to use regex, here is a small class I made, followed by it's usage : 如果您绝对不想使用正则表达式,那么这是我制作的一个小类,后面是它的用法:
public class ParseHelper
{
public char TextDelimiter { get; set; }
public char TextQualifier { get; set; }
public char EscapeCharacter { get; set; }
public List<string> Parse(string str, bool keepTextQualifiers = false)
{
List<string> returnedValues = new List<string>();
bool inQualifiers = false;
string currentWord = "";
for (int i = 0; i < str.Length; i++)
{
//Looking for EscapeCharacter.
if (str[i] == EscapeCharacter)
{
i++;
currentWord += str[i];
continue;
}
//Looking for TextQualifier.
if (str[i] == TextQualifier)
{
if (keepTextQualifiers)
currentWord += TextQualifier;
inQualifiers = !inQualifiers;
continue;
}
//Looking for TextDelimiter.
if (str[i] == TextDelimiter && !inQualifiers)
{
returnedValues.Add(currentWord);
currentWord = "";
continue;
}
currentWord += str[i];
}
if (inQualifiers)
throw new FormatException("The input string, 'str', is not properly formated.");
returnedValues.Add(currentWord);
currentWord = "";
return returnedValues;
}
}
Usage, based on your case : 用法,根据您的情况:
ParseHelper ph = new ParseHelper() {
TextDelimiter = ',',
TextQualifier = '"',
EscapeCharacter = '\'};
List<string> parsedLine = ph.Parse(unparsedLine);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.