简体   繁体   English

如何在引号之间忽略逗号而拆分(',')字符串?

[英]How can I Split(',') a string while ignore commas in between quotes?

I am using the .Split(',') method on a string that I know has values delimited by commas and I want those values to be separated and put into a string[] object. 我在字符串上使用.Split(',')方法,我知道这些字符串用逗号分隔,我希望将这些值分开并放入string[]对象。 This works great for strings like this: 这适用于这样的字符串:

78,969.82,GW440, . 78,969.82,GW440, .

But the values start to look different when that second value goes over 1000, like the one found in this example: 但是当第二个值超过1000时,值开始看起来不同,就像在这个例子中找到的那样:

79,"1,013.42",GW450,... . 79,"1,013.42",GW450,...

These values are coming from a spreadsheet control where I use the controls built in ExportToCsv(...) method and that explains why a formatted version of the actual numerical value. 这些值来自电子表格控件,我在其中使用ExportToCsv(...)方法中构建的控件,这解释了为什么实际数值的格式化版本。

Question

Is there a way I can get the .Split(',') method to ignore commas inside of quotes? 有没有办法让.Split(',')方法忽略引号内的逗号? I don't actually want the value "1,013.42" to be split up as "1 and 013.42" . 我实际上并不希望将值"1,013.42"拆分为"1013.42"

Any ideas? 有任何想法吗? Thanks! 谢谢!

Update 更新

I really would like to do this without incorporating a 3rd party tool as my use case really doesn't involve many other cases besides this one and even though it is part of my work's solution, having a tool like that incorporated doesn't really benefit anyone at the moment. 我真的很想在不使用第三方工具的情况下这样做,因为我的用例实际上并没有涉及除此之外的许多其他情况,即使它是我工作解决方案的一部分,使用这样的工具也没有真正受益此刻的任何人。 I was hoping there was something quick to solve this particular use case that I was missing, but now that it is the weekend, I'll see if I can't give one more update to this question on Monday with the solution I eventually come up with. 我希望有一些东西可以快速解决我丢失的这个特殊用例,但是现在是周末,我会看到周一我能不能再给这个问题多一次更新了解决方案我最终来了起来。 Thank you everyone for you assistance so far, I'll will assess each answer further on Monday. 到目前为止,谢谢大家的帮助,我将在星期一进一步评估每个答案。

This is a fairly straight forward CSV Reader implementation we use in a few projects here. 这是一个相当直接的CSV Reader实现,我们在这里的一些项目中使用它。 Easy to use and handles those cases you are talking about. 易于使用并处理您正在谈论的案例。

First the CSV Class 首先是CSV类

public static class Csv
{
    public static string Escape(string s)
    {
        if (s.Contains(QUOTE))
            s = s.Replace(QUOTE, ESCAPED_QUOTE);

        if (s.IndexOfAny(CHARACTERS_THAT_MUST_BE_QUOTED) > -1)
            s = QUOTE + s + QUOTE;

        return s;
    }

    public static string Unescape(string s)
    {
        if (s.StartsWith(QUOTE) && s.EndsWith(QUOTE))
        {
            s = s.Substring(1, s.Length - 2);

            if (s.Contains(ESCAPED_QUOTE))
                s = s.Replace(ESCAPED_QUOTE, QUOTE);
        }

        return s;
    }


    private const string QUOTE = "\"";
    private const string ESCAPED_QUOTE = "\"\"";
    private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };

}

Then a pretty nice Reader implementation - If you need it. 然后一个非常好的Reader实现 - 如果你需要它。 You should be able to do what you need with just the CSV class above. 只需上面的CSV类,您就可以完成所需的操作。

public sealed class CsvReader : System.IDisposable
{
    public CsvReader(string fileName)
        : this(new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
    }

    public CsvReader(Stream stream)
    {
        __reader = new StreamReader(stream);
    }

    public System.Collections.IEnumerable RowEnumerator
    {
        get
        {
            if (null == __reader)
                throw new System.ApplicationException("I can't start reading without CSV input.");

            __rowno = 0;
            string sLine;
            string sNextLine;

            while (null != (sLine = __reader.ReadLine()))
            {
                while (rexRunOnLine.IsMatch(sLine) && null != (sNextLine = __reader.ReadLine()))
                    sLine += "\n" + sNextLine;

                __rowno++;
                string[] values = rexCsvSplitter.Split(sLine);

                for (int i = 0; i < values.Length; i++)
                    values[i] = Csv.Unescape(values[i]);

                yield return values;
            }

            __reader.Close();
        }

    }

    public long RowIndex { get { return __rowno; } }

    public void Dispose()
    {
        if (null != __reader) __reader.Dispose();
    }

    //============================================


    private long __rowno = 0;
    private TextReader __reader;
    private static Regex rexCsvSplitter = new Regex(@",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))");
    private static Regex rexRunOnLine = new Regex(@"^[^""]*(?:""[^""]*""[^""]*)*""[^""]*$");

}

Then you can use it like this. 然后你可以像这样使用它。

var reader = new CsvReader(new FileStream(file, FileMode.Open));

Note: This would open an existing CSV file, but can be modified fairly easily to take a string[] like you need. 注意:这将打开现有的CSV文件,但可以相当容易地修改以获取您需要的string[]

Since you're reading a CSV file, the best course of action would be to use an existing CSV reader. 由于您正在阅读CSV文件,因此最好的做法是使用现有的CSV阅读器。 There's more to CSV than just commas between quotes. 除了引号之间的逗号之外,CSV还有更多内容。 Finding all of the cases you need to handle would be more work than it's worth. 找到你需要处理的所有案例将是更多的工作而不是它的价值。

Here's a CSV reader question on SO. 这是关于SO的CSV阅读器问题。

您可能应该阅读本文: 基于逗号的拆分的正则表达式忽略报价中的逗号虽然它是针对Java的,但正则表达式是相同的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM