简体   繁体   English

如何使用C#split()函数正确拆分CSV?

[英]How to properly split a CSV using C# split() function?

Suppose I have this CSV file : 假设我有这个CSV文件:

NAME,ADDRESS,DATE
"Eko S. Wibowo", "Tamanan, Banguntapan, Bantul, DIY", "6/27/1979"

I would like like to store each token that enclosed using a double quotes to be in an array, is there a safe to do this instead of using the String split() function? 我想将使用双引号括起来的每个令牌存储在一个数组中,是否可以安全地执行此操作而不是使用String split()函数? Currently I load up the file in a RichTextBox, and then using its Lines[] property, I do a loop for each Lines[] element and doing this : 目前我在RichTextBox中加载文件,然后使用其Lines []属性,我为每个Lines []元素执行循环并执行以下操作:

string[] line = s.Split(',');

s is a reference to RichTextBox.Lines[]. s是对RichTextBox.Lines []的引用。 And as you can clearly see, the comma inside a token can easily messed up split() function. 而且你可以清楚地看到,令牌内的逗号很容易搞乱split()函数。 So, instead of ended with three token as I want it, I ended with 6 tokens 所以,不是以我想要的三个令牌结束,而是以6个令牌结束

Any help will be appreciated! 任何帮助将不胜感激!

You could use regex too: 你也可以使用正则表达式:

string input = "\"Eko S. Wibowo\", \"Tamanan, Banguntapan, Bantul, DIY\", \"6/27/1979\"";
string pattern = @"""\s*,\s*""";

// input.Substring(1, input.Length - 2) removes the first and last " from the string
string[] tokens = System.Text.RegularExpressions.Regex.Split(
    input.Substring(1, input.Length - 2), pattern);

This will give you: 这会给你:

Eko S. Wibowo
Tamanan, Banguntapan, Bantul, DIY
6/27/1979

I've done this with my own method. 我用自己的方法完成了这个。 It simply counts the amout of " and ' characters. 它只计算"'字符的大小。
Improve this to your needs. 根据您的需求改进这一点。

    public List<string> SplitCsvLine(string s) {
        int i;
        int a = 0;
        int count = 0;
        List<string> str = new List<string>();
        for (i = 0; i < s.Length; i++) {
            switch (s[i]) {
                case ',':
                    if ((count & 1) == 0) {
                        str.Add(s.Substring(a, i - a));
                        a = i + 1;
                    }
                    break;
                case '"':
                case '\'': count++; break;
            }
        }
        str.Add(s.Substring(a));
        return str;
    }

It's not an exact answer to your question, but why don't you use already written library to manipulate CSV file, good example would be LinqToCsv . 这不是你的问题的确切答案,但为什么你不使用已经编写的库来操纵CSV文件,很好的例子是LinqToCsv CSV could be delimited with various punctuation signs. CSV可以用各种标点符号分隔。 Moreover, there are gotchas, which are already addressed by library creators. 此外,还有一些问题,这些问题已经由图书馆创建者解决。 Such as dealing with name row, dealing with different date formats and mapping rows to C# objects. 比如处理名称行,处理不同的日期格式以及将行映射到C#对象。

If your CSV line is tightly packed it's easiest to use the end and tail removal mentioned earlier and then a simple split on a joining string 如果您的CSV行紧凑,最简单的方法是使用前面提到的结束和尾部删除,然后在连接字符串上进行简单的拆分

 string[] tokens = input.Substring(1, input.Length - 2).Split("\",\"");

This will only work if ALL fields are double-quoted even if they don't (officially) need to be. 这只有在所有字段都是双引号的情况下才有效,即使它们没有(官方)需要。 It will be faster than RegEx but with given conditions as to its use. 它会比RegEx更快,但在使用时会有一定的条件。

Really useful if your data looks like "Name","1","12/03/2018","Add1,Add2,Add3","other stuff" 如果您的数据看起来像“名称”,“1”,“12/03/2018”,“Add1,Add2,Add3”,“其他东西”,那将非常有用

You can replace "," with ; 你可以用","代替; then split by ; 然后分裂;

var values= s.Replace("\",\"",";").Split(';');

Five years old but there is always somebody new who wants to split a CSV. 五岁,但总有一些人想要拆分CSV。

If your data is simple and predictable (ie never has any special characters like commas, quotes and newlines) then you can do it with split() or regex. 如果您的数据简单且可预测(即从未有任何特殊字符,如逗号,引号和换行符),那么您可以使用split()或regex。

But to support all the nuances of the CSV format properly without code soup you should really use a library where all the magic has already been figured out. 但要正确支持CSV格式的所有细微差别而不需要代码汤,你应该真正使用一个已经找到所有魔法的库。 Don't re-invent the wheel (unless you are doing it for fun of course). 不要重新发明轮子(除非你当然是为了好玩)。

CsvHelper is simple enough to use: CsvHelper很简单,可以使用:

https://joshclose.github.io/CsvHelper/2.x/ https://joshclose.github.io/CsvHelper/2.x/

using (var parser = new CsvParser(textReader)
{
    while(true)
    {
        string[] line = parser.Read();

        if (line != null)
        {
            // do something
        }
        else
        {
            break;
        }
    }
}

More discussion / same question: Dealing with commas in a CSV file 更多讨论/相同问题: 在CSV文件中处理逗号

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM