简体   繁体   中英

CSV Parsing error using regex object C#

I am using ParseCSV function to parse a CSV file in C#.

The last column in a row of CSV file contains: NM 120922C00002500(lots of spaces after this)

In ParseCSV function i am passing an inputstring, as a result of reading the CSV file.

A part of the inputstring is:

"1",000066,"07/30/2012","53193315D4","B ","99AAXXPB0"," "," "," ","CALL NM 09/22/12 00002.500 ","MG",100.00,1.050000,310,32550.00,25530.70,360,37800.00,30477.78,"C",2.50000,09/22/2012,"NM"," NM 120922C00002500 ".

in the CSVParse function, am doing the following:

string csvParsingRegularExpressionOld = Prana.Global.ConfigurationHelper.Instance.GetAppSettingValueByKey("CSVParsingRegularExpression");
string csvParsingRegularExpression = csvParsingRegularExpressionOld.Replace("\\\\", "\\");

In csvParsingRegularExpression value comes out as:

((?<field>[^",\r\n]*)|"(?<field>([^"]|"")*)")(,|(?<rowbreak>\r\n|\n|$))

The I follow up with

Regex re = new Regex(csvParsingRegularExpression);

MatchCollection mc = re.Matches(inputString);

foreach (Match m in mc) 
{

   field = m.Result("${field}").Replace("\"\"", "\"");
}

But here field contains empty string when it comes to the last value "NM 120922C00002500". What may be the possible solution for this problem?

I dont know if there's a problem with the CSV file or with the regex method " Matches ".

You're not matching the last group because it ends with a period outside the quotes. If you add the period to the terminating group of your regex it works:

(\"?(?<field>[^",\r|\n]*)\"?\,?)*\.?(?<rowbreak>[\r|\n]*)

Although as other comments have pointed out, it's not a great idea to roll your own parser if the data is really valid CSV (I did't bother to check whether the given sample matches the spec ). There are plenty of parsers available and you're likely to miss some edge cases.

If you don't absolutely want to use regex, here is a small class I made, followed by it's usage :

public class ParseHelper
{
    public char TextDelimiter { get; set; }
    public char TextQualifier { get; set; }
    public char EscapeCharacter { get; set; }

    public List<string> Parse(string str, bool keepTextQualifiers = false)
    {
        List<string> returnedValues = new List<string>();

        bool inQualifiers = false;
        string currentWord = "";

        for (int i = 0; i < str.Length; i++)
        {
            //Looking for EscapeCharacter.
            if (str[i] == EscapeCharacter)
            {
                i++;
                currentWord += str[i];
                continue;
            }

            //Looking for TextQualifier.
            if (str[i] == TextQualifier)
            {
                if (keepTextQualifiers)
                    currentWord += TextQualifier;

                inQualifiers = !inQualifiers;
                continue;
            }

            //Looking for TextDelimiter.
            if (str[i] == TextDelimiter && !inQualifiers)
            {
                returnedValues.Add(currentWord);
                currentWord = "";
                continue;
            }

            currentWord += str[i];
        }

        if (inQualifiers)
            throw new FormatException("The input string, 'str', is not properly formated.");

        returnedValues.Add(currentWord);
        currentWord = "";

        return returnedValues;
    }
}

Usage, based on your case :

ParseHelper ph = new ParseHelper() {
    TextDelimiter = ',',
    TextQualifier = '"',
    EscapeCharacter = '\'};
List<string> parsedLine = ph.Parse(unparsedLine);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM