简体   繁体   English

使用正则表达式对象C#的CSV解析错误

[英]CSV Parsing error using regex object C#

I am using ParseCSV function to parse a CSV file in C#. 我正在使用ParseCSV函数来解析C#中的CSV文件。

The last column in a row of CSV file contains: NM 120922C00002500(lots of spaces after this) CSV文件行的最后一列包含: NM 120922C00002500(此后有很多空格)

In ParseCSV function i am passing an inputstring, as a result of reading the CSV file. 在ParseCSV函数中,由于读取CSV文件,我传递了一个输​​入字符串。

A part of the inputstring is: 输入字符串的一部分是:

"1",000066,"07/30/2012","53193315D4","B ","99AAXXPB0"," "," "," ","CALL NM 09/22/12 00002.500 ","MG",100.00,1.050000,310,32550.00,25530.70,360,37800.00,30477.78,"C",2.50000,09/22/2012,"NM"," NM 120922C00002500 ". “ 1”,000066,“ 07/30/2012”,“ 53193315D4”,“ B”,“ 99AAXXPB0”,“”,“”,“”,“ CALL NM 09/22/12 00002.500”,“ MG”, 100.00,1.050000,310,32550.00,25530.70,360,37800.00,30477.78,“ C”,2.50000,09 / 22/2012,“ NM”,“ NM 120922C00002500 ”。

in the CSVParse function, am doing the following: 在CSVParse函数中,正在执行以下操作:

string csvParsingRegularExpressionOld = Prana.Global.ConfigurationHelper.Instance.GetAppSettingValueByKey("CSVParsingRegularExpression");
string csvParsingRegularExpression = csvParsingRegularExpressionOld.Replace("\\\\", "\\");

In csvParsingRegularExpression value comes out as: 在csvParsingRegularExpression中,值显示为:

((?<field>[^",\r\n]*)|"(?<field>([^"]|"")*)")(,|(?<rowbreak>\r\n|\n|$))

The I follow up with 我跟进

Regex re = new Regex(csvParsingRegularExpression);

MatchCollection mc = re.Matches(inputString);

foreach (Match m in mc) 
{

   field = m.Result("${field}").Replace("\"\"", "\"");
}

But here field contains empty string when it comes to the last value "NM 120922C00002500". 但是,当最后一个值是“ NM 120922C00002500”时,此处的字段包含空字符串。 What may be the possible solution for this problem? 这个问题可能有什么解决方案?

I dont know if there's a problem with the CSV file or with the regex method " Matches ". 我不知道CSV文件或正则表达式方法“ Matches ”是否存在问题。

You're not matching the last group because it ends with a period outside the quotes. 您没有匹配最后一组,因为它以引号外的句号结尾。 If you add the period to the terminating group of your regex it works: 如果将句点添加到正则表达式的终止组中,它将起作用:

(\"?(?<field>[^",\r|\n]*)\"?\,?)*\.?(?<rowbreak>[\r|\n]*)

Although as other comments have pointed out, it's not a great idea to roll your own parser if the data is really valid CSV (I did't bother to check whether the given sample matches the spec ). 尽管正如其他评论所指出的那样,如果数据确实是有效的CSV,则滚动自己的解析器并不是一个好主意(我不费力去检查给定的样本是否与spec匹配)。 There are plenty of parsers available and you're likely to miss some edge cases. 很多可用的解析器,您可能会错过一些边缘情况。

If you don't absolutely want to use regex, here is a small class I made, followed by it's usage : 如果您绝对不想使用正则表达式,那么这是我制作的一个小类,后面是它的用法:

public class ParseHelper
{
    public char TextDelimiter { get; set; }
    public char TextQualifier { get; set; }
    public char EscapeCharacter { get; set; }

    public List<string> Parse(string str, bool keepTextQualifiers = false)
    {
        List<string> returnedValues = new List<string>();

        bool inQualifiers = false;
        string currentWord = "";

        for (int i = 0; i < str.Length; i++)
        {
            //Looking for EscapeCharacter.
            if (str[i] == EscapeCharacter)
            {
                i++;
                currentWord += str[i];
                continue;
            }

            //Looking for TextQualifier.
            if (str[i] == TextQualifier)
            {
                if (keepTextQualifiers)
                    currentWord += TextQualifier;

                inQualifiers = !inQualifiers;
                continue;
            }

            //Looking for TextDelimiter.
            if (str[i] == TextDelimiter && !inQualifiers)
            {
                returnedValues.Add(currentWord);
                currentWord = "";
                continue;
            }

            currentWord += str[i];
        }

        if (inQualifiers)
            throw new FormatException("The input string, 'str', is not properly formated.");

        returnedValues.Add(currentWord);
        currentWord = "";

        return returnedValues;
    }
}

Usage, based on your case : 用法,根据您的情况:

ParseHelper ph = new ParseHelper() {
    TextDelimiter = ',',
    TextQualifier = '"',
    EscapeCharacter = '\'};
List<string> parsedLine = ph.Parse(unparsedLine);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM