简体   繁体   English

如何将 CSV 文件读入 .NET 数据表

[英]How to read a CSV file into a .NET Datatable

How can I load a CSV file into a System.Data.DataTable , creating the datatable based on the CSV file?如何将 CSV 文件加载到System.Data.DataTable中,基于 CSV 文件创建数据表?

Does the regular ADO.net functionality allow this?常规 ADO.net 功能是否允许这样做?

I have been using OleDb provider.我一直在使用OleDb提供程序。 However, it has problems if you are reading in rows that have numeric values but you want them treated as text.但是,如果您正在读取具有数值的行但希望将它们视为文本,则会出现问题。 However, you can get around that issue by creating a schema.ini file.但是,您可以通过创建schema.ini文件来解决该问题。 Here is my method I used:这是我使用的方法:

// using System.Data;
// using System.Data.OleDb;
// using System.Globalization;
// using System.IO;

static DataTable GetDataTableFromCsv(string path, bool isFirstRowHeader)
{
    string header = isFirstRowHeader ? "Yes" : "No";

    string pathOnly = Path.GetDirectoryName(path);
    string fileName = Path.GetFileName(path);

    string sql = @"SELECT * FROM [" + fileName + "]";

    using(OleDbConnection connection = new OleDbConnection(
              @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + pathOnly + 
              ";Extended Properties=\"Text;HDR=" + header + "\""))
    using(OleDbCommand command = new OleDbCommand(sql, connection))
    using(OleDbDataAdapter adapter = new OleDbDataAdapter(command))
    {
        DataTable dataTable = new DataTable();
        dataTable.Locale = CultureInfo.CurrentCulture;
        adapter.Fill(dataTable);
        return dataTable;
    }
}

Here's an excellent class that will copy CSV data into a datatable using the structure of the data to create the DataTable:这是一个优秀的类,它将使用数据的结构将 CSV 数据复制到数据表中以创建数据表:

A portable and efficient generic parser for flat files 用于平面文件的可移植且高效的通用解析器

It's easy to configure and easy to use.它易于配置且易于使用。 I urge you to take a look.我劝你看看。

I have decided to use Sebastien Lorion's Csv Reader .我决定使用Sebastien Lorion 的 Csv Reader

Jay Riggs suggestion is a great solution also, but I just didn't need all of the features that that Andrew Rissing's Generic Parser provides. Jay Riggs 的建议也是一个很好的解决方案,但我并不需要Andrew Rissing 的 Generic Parser提供的所有功能。

UPDATE 10/25/2010更新 10/25/2010

After using Sebastien Lorion's Csv Reader in my project for nearly a year and a half, I have found that it throws exceptions when parsing some csv files that I believe to be well formed.在我的项目中使用Sebastien Lorion 的 Csv Reader将近一年半后,我发现它在解析一些我认为格式良好的 csv 文件时会抛出异常。

So, I did switch to Andrew Rissing's Generic Parser and it seems to be doing much better.所以,我确实切换到了Andrew Rissing 的 Generic Parser ,它似乎做得更好。

UPDATE 9/22/2014更新 9/22/2014

These days, I mostly use this extension method to read delimited text:这些天,我主要使用这种扩展方法来阅读分隔文本:

https://github.com/Core-Techs/Common/blob/master/CoreTechs.Common/Text/DelimitedTextExtensions.cs#L22 https://github.com/Core-Techs/Common/blob/master/CoreTechs.Common/Text/DelimitedTextExtensions.cs#L22

https://www.nuget.org/packages/CoreTechs.Common/ https://www.nuget.org/packages/CoreTechs.Common/

UPDATE 2/20/2015更新 2/20/2015

Example:例子:

var csv = @"Name, Age
Ronnie, 30
Mark, 40
Ace, 50";

TextReader reader = new StringReader(csv);
var table = new DataTable();
using(var it = reader.ReadCsvWithHeader().GetEnumerator())
{

    if (!it.MoveNext()) return;

    foreach (var k in it.Current.Keys)
        table.Columns.Add(k);

    do
    {
        var row = table.NewRow();
        foreach (var k in it.Current.Keys)
            row[k] = it.Current[k];
    
        table.Rows.Add(row);
    
    } while (it.MoveNext());
}

Hey its working 100%嘿它的工作100%

  public static DataTable ConvertCSVtoDataTable(string strFilePath)
  {
    DataTable dt = new DataTable();
    using (StreamReader sr = new StreamReader(strFilePath))
    {
        string[] headers = sr.ReadLine().Split(',');
        foreach (string header in headers)
        {
            dt.Columns.Add(header);
        }
        while (!sr.EndOfStream)
        {
            string[] rows = sr.ReadLine().Split(',');
            DataRow dr = dt.NewRow();
            for (int i = 0; i < headers.Length; i++)
            {
                dr[i] = rows[i];
            }
            dt.Rows.Add(dr);
        }

    }


    return dt;
   }

CSV Image CSV 图像在此处输入图片说明

Data table Imported数据表导入在此处输入图片说明

We always used to use the Jet.OLEDB driver, until we started going to 64 bit applications.我们一直使用 Jet.OLEDB 驱动程序,直到我们开始使用 64 位应用程序。 Microsoft has not and will not release a 64 bit Jet driver. Microsoft 没有也不会发布 64 位 Jet 驱动程序。 Here's a simple solution we came up with that uses File.ReadAllLines and String.Split to read and parse the CSV file and manually load a DataTable.这是我们提出的一个简单解决方案,它使用 File.ReadAllLines 和 String.Split 来读取和解析 CSV 文件并手动加载数据表。 As noted above, it DOES NOT handle the situation where one of the column values contains a comma.如上所述,它不处理列值之一包含逗号的情况。 We use this mostly for reading custom configuration files - the nice part about using CSV files is that we can edit them in Excel.我们主要使用它来读取自定义配置文件 - 使用 CSV 文件的好处是我们可以在 Excel 中编辑它们。

string CSVFilePathName = @"C:\test.csv";
string[] Lines = File.ReadAllLines(CSVFilePathName);
string[] Fields;
Fields = Lines[0].Split(new char[] { ',' });
int Cols = Fields.GetLength(0);
DataTable dt = new DataTable();
//1st row must be column names; force lower case to ensure matching later on.
for (int i = 0; i < Cols; i++)
    dt.Columns.Add(Fields[i].ToLower(), typeof(string));
DataRow Row;
for (int i = 1; i < Lines.GetLength(0); i++)
{
    Fields = Lines[i].Split(new char[] { ',' });
    Row = dt.NewRow();
    for (int f = 0; f < Cols; f++)
        Row[f] = Fields[f];
    dt.Rows.Add(Row);
}

this is the code i use it but your apps must run with net version 3.5这是我使用的代码,但您的应用程序必须使用 net 3.5 版运行

private void txtRead_Click(object sender, EventArgs e)
        {
           // var filename = @"d:\shiptest.txt";

            openFileDialog1.InitialDirectory = "d:\\";
            openFileDialog1.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*";
            DialogResult result = openFileDialog1.ShowDialog();
            if (result == DialogResult.OK)
            {
                if (openFileDialog1.FileName != "")
                {
                    var reader = ReadAsLines(openFileDialog1.FileName);

                    var data = new DataTable();

                    //this assume the first record is filled with the column names
                    var headers = reader.First().Split(',');
                    foreach (var header in headers)
                    {
                        data.Columns.Add(header);
                    }

                    var records = reader.Skip(1);
                    foreach (var record in records)
                    {
                        data.Rows.Add(record.Split(','));
                    }

                    dgList.DataSource = data;
                }
            }
        }

        static IEnumerable<string> ReadAsLines(string filename)
        {
            using (StreamReader reader = new StreamReader(filename))
                while (!reader.EndOfStream)
                    yield return reader.ReadLine();
        }

You can achieve it by using Microsoft.VisualBasic.FileIO.TextFieldParser dll in C#您可以通过在 C# 中使用 Microsoft.VisualBasic.FileIO.TextFieldParser dll 来实现它

static void Main()
        {
            string csv_file_path=@"C:\Users\Administrator\Desktop\test.csv";

            DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);

            Console.WriteLine("Rows count:" + csvData.Rows.Count);

            Console.ReadLine();
        }


private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
        {
            DataTable csvData = new DataTable();

            try
            {

            using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
                {
                    csvReader.SetDelimiters(new string[] { "," });
                    csvReader.HasFieldsEnclosedInQuotes = true;
                    string[] colFields = csvReader.ReadFields();
                    foreach (string column in colFields)
                    {
                        DataColumn datecolumn = new DataColumn(column);
                        datecolumn.AllowDBNull = true;
                        csvData.Columns.Add(datecolumn);
                    }

                    while (!csvReader.EndOfData)
                    {
                        string[] fieldData = csvReader.ReadFields();
                        //Making empty value as null
                        for (int i = 0; i < fieldData.Length; i++)
                        {
                            if (fieldData[i] == "")
                            {
                                fieldData[i] = null;
                            }
                        }
                        csvData.Rows.Add(fieldData);
                    }
                }
            }
            catch (Exception ex)
            {
            }
            return csvData;
        }

The best option I have found, and it resolves issues where you may have different versions of Office installed, and also 32/64-bit issues like Chuck Bevitt mentioned , is FileHelpers .我发现的最佳选择是FileHelpers ,它解决了您可能安装了不同版本的 Office 以及Chuck Bevitt 提到的32/64 位问题的问题。

It can be added to your project references using NuGet and it provides a one-liner solution:可以使用 NuGet 将其添加到您的项目引用中,并提供单行解决方案:

CommonEngine.CsvToDataTable(path, "ImportRecord", ',', true);

Modified from Mr ChuckBevitt修改自查克贝维特先生

Working solution:工作解决方案:

string CSVFilePathName = APP_PATH + "Facilities.csv";
string[] Lines = File.ReadAllLines(CSVFilePathName);
string[] Fields;
Fields = Lines[0].Split(new char[] { ',' });
int Cols = Fields.GetLength(0);
DataTable dt = new DataTable();
//1st row must be column names; force lower case to ensure matching later on.
for (int i = 0; i < Cols-1; i++)
        dt.Columns.Add(Fields[i].ToLower(), typeof(string));
DataRow Row;
for (int i = 0; i < Lines.GetLength(0)-1; i++)
{
        Fields = Lines[i].Split(new char[] { ',' });
        Row = dt.NewRow();
        for (int f = 0; f < Cols-1; f++)
                Row[f] = Fields[f];
        dt.Rows.Add(Row);
}

For those of you wishing not to use an external library, and prefer not to use OleDB, see the example below.对于那些不希望使用外部库并且不想使用 OleDB 的人,请参阅下面的示例。 Everything I found was either OleDB, external library, or simply splitting based on a comma!我发现的一切要么是 OleDB、外部库,要么只是基于逗号的拆分! For my case OleDB was not working so I wanted something different.对于我的情况,OleDB 不起作用,所以我想要一些不同的东西。

I found an article by MarkJ that referenced the Microsoft.VisualBasic.FileIO.TextFieldParser method as seen here .我发现MarkJ了一篇文章,引用的Microsoft.VisualBasic.FileIO.TextFieldParser方法,看到这里 The article is written in VB and doesn't return a datatable, so see my example below.文章是用VB写的,不返回数据表,所以看我下面的例子。

public static DataTable LoadCSV(string path, bool hasHeader)
    {
        DataTable dt = new DataTable();

        using (var MyReader = new Microsoft.VisualBasic.FileIO.TextFieldParser(path))
        {
            MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
            MyReader.Delimiters = new String[] { "," };

            string[] currentRow;

            //'Loop through all of the fields in the file.  
            //'If any lines are corrupt, report an error and continue parsing.  
            bool firstRow = true;
            while (!MyReader.EndOfData)
            {
                try
                {
                    currentRow = MyReader.ReadFields();

                    //Add the header columns
                    if (hasHeader && firstRow)
                    {
                        foreach (string c in currentRow)
                        {
                            dt.Columns.Add(c, typeof(string));
                        }

                        firstRow = false;
                        continue;
                    }

                    //Create a new row
                    DataRow dr = dt.NewRow();
                    dt.Rows.Add(dr);

                    //Loop thru the current line and fill the data out
                    for(int c = 0; c < currentRow.Count(); c++)
                    {
                        dr[c] = currentRow[c];
                    }
                }
                catch (Microsoft.VisualBasic.FileIO.MalformedLineException ex)
                {
                    //Handle the exception here
                }
            }
        }

        return dt;
    }

Very basic answer: if you don't have a complex csv that can use a simple split function this will work well for importing (note this imports as strings, i do datatype conversions later if i need to)非常基本的答案:如果您没有可以使用简单拆分函数的复杂 csv,这将适用于导入(请注意,这是作为字符串导入的,如果需要,我稍后会进行数据类型转换)

 private DataTable csvToDataTable(string fileName, char splitCharacter)
    {                
        StreamReader sr = new StreamReader(fileName);
        string myStringRow = sr.ReadLine();
        var rows = myStringRow.Split(splitCharacter);
        DataTable CsvData = new DataTable();
        foreach (string column in rows)
        {
            //creates the columns of new datatable based on first row of csv
            CsvData.Columns.Add(column);
        }
        myStringRow = sr.ReadLine();
        while (myStringRow != null)
        {
            //runs until string reader returns null and adds rows to dt 
            rows = myStringRow.Split(splitCharacter);
            CsvData.Rows.Add(rows);
            myStringRow = sr.ReadLine();
        }
        sr.Close();
        sr.Dispose();
        return CsvData;
    }

My method if I am importing a table with a string[] separater and handles the issue where the current line i am reading may have went to the next line in the csv or text file <- IN which case i want to loop until I get to the total number of lines in the first row (columns)我的方法,如果我导入一个带有 string[] 分隔符的表并处理我正在阅读的当前行可能已经转到 csv 或文本文件中的下一行的问题 <- 在这种情况下,我想循环直到我得到到第一行(列)的总行数

public static DataTable ImportCSV(string fullPath, string[] sepString)
    {
        DataTable dt = new DataTable();
        using (StreamReader sr = new StreamReader(fullPath))
        {
           //stream uses using statement because it implements iDisposable
            string firstLine = sr.ReadLine();
            var headers = firstLine.Split(sepString, StringSplitOptions.None);
            foreach (var header in headers)
            {
               //create column headers
                dt.Columns.Add(header);
            }
            int columnInterval = headers.Count();
            string newLine = sr.ReadLine();
            while (newLine != null)
            {
                //loop adds each row to the datatable
                var fields = newLine.Split(sepString, StringSplitOptions.None); // csv delimiter    
                var currentLength = fields.Count();
                if (currentLength < columnInterval)
                {
                    while (currentLength < columnInterval)
                    {
                       //if the count of items in the row is less than the column row go to next line until count matches column number total
                        newLine += sr.ReadLine();
                        currentLength = newLine.Split(sepString, StringSplitOptions.None).Count();
                    }
                    fields = newLine.Split(sepString, StringSplitOptions.None);
                }
                if (currentLength > columnInterval)
                {  
                    //ideally never executes - but if csv row has too many separators, line is skipped
                    newLine = sr.ReadLine();
                    continue;
                }
                dt.Rows.Add(fields);
                newLine = sr.ReadLine();
            }
            sr.Close();
        }

        return dt;
    }
    private static DataTable LoadCsvData(string refPath)
    {
        var cfg = new Configuration() { Delimiter = ",", HasHeaderRecord = true };
        var result = new DataTable();
        using (var sr = new StreamReader(refPath, Encoding.UTF8, false, 16384 * 2))
        {
            using (var rdr = new CsvReader(sr, cfg))
            using (var dataRdr = new CsvDataReader(rdr))
            {
                result.Load(dataRdr);
            }
        }
        return result;
    }

using: https://joshclose.github.io/CsvHelper/使用: https : //joshclose.github.io/CsvHelper/

public class Csv
{
    public static DataTable DataSetGet(string filename, string separatorChar, out List<string> errors)
    {
        errors = new List<string>();
        var table = new DataTable("StringLocalization");
        using (var sr = new StreamReader(filename, Encoding.Default))
        {
            string line;
            var i = 0;
            while (sr.Peek() >= 0)
            {
                try
                {
                    line = sr.ReadLine();
                    if (string.IsNullOrEmpty(line)) continue;
                    var values = line.Split(new[] {separatorChar}, StringSplitOptions.None);
                    var row = table.NewRow();
                    for (var colNum = 0; colNum < values.Length; colNum++)
                    {
                        var value = values[colNum];
                        if (i == 0)
                        {
                            table.Columns.Add(value, typeof (String));
                        }
                        else
                        {
                            row[table.Columns[colNum]] = value;
                        }
                    }
                    if (i != 0) table.Rows.Add(row);
                }
                catch(Exception ex)
                {
                    errors.Add(ex.Message);
                }
                i++;
            }
        }
        return table;
    }
}

I came across this piece of code that uses Linq and regex to parse a CSV file.我遇到了这段使用 Linq 和正则表达式解析 CSV 文件的代码。 The refering article is now over a year and a half old, but have not come across a neater way to parse a CSV using Linq (and regex) than this.参考文章现在已经有一年半的历史了,但还没有遇到比这更简洁的使用 Linq(和正则表达式)解析 CSV 的方法。 The caveat is the regex applied here is for comma delimited files (will detect commas inside quotes!) and that it may not take well to headers, but there is a way to overcome these).需要注意的是,此处应用的正则表达式适用于逗号分隔的文件(将检测引号内的逗号!)并且它可能不适用于标题,但有一种方法可以克服这些问题)。 Take a peak:取一个峰值:

Dim lines As String() = System.IO.File.ReadAllLines(strCustomerFile)
Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
Dim custs = From line In lines _
            Let data = r.Split(line) _
                Select New With {.custnmbr = data(0), _
                                 .custname = data(1)}
For Each cust In custs
    strCUSTNMBR = Replace(cust.custnmbr, Chr(34), "")
    strCUSTNAME = Replace(cust.custname, Chr(34), "")
Next

Here's a solution that uses ADO.Net's ODBC text driver:这是一个使用 ADO.Net 的 ODBC 文本驱动程序的解决方案:

Dim csvFileFolder As String = "C:\YourFileFolder"
Dim csvFileName As String = "YourFile.csv"

'Note that the folder is specified in the connection string,
'not the file. That's specified in the SELECT query, later.
Dim connString As String = "Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" _
    & csvFileFolder & ";Extended Properties=""Text;HDR=No;FMT=Delimited"""
Dim conn As New Odbc.OdbcConnection(connString)

'Open a data adapter, specifying the file name to load
Dim da As New Odbc.OdbcDataAdapter("SELECT * FROM [" & csvFileName & "]", conn)
'Then fill a data table, which can be bound to a grid
Dim dt As New DataTableda.Fill(dt)

grdCSVData.DataSource = dt

Once filled, you can value properties of the datatable, like ColumnName, to make utilize all the powers of the ADO.Net data objects.填充后,您可以对数据表的属性(如 ColumnName)进行赋值,以利用 ADO.Net 数据对象的所有功能。

In VS2008 you can use Linq to achieve the same effect.在 VS2008 中你可以使用 Linq 来达到同样的效果。

NOTE: This may be a duplicate of this SO question.注意:这可能是这个SO 问题的副本。

Can't resist adding my own spin to this.无法抗拒为此添加我自己的旋转。 This is so much better and more compact than what I've used in the past.这比我过去使用的要好得多,也更紧凑。

This solution:这个解决方案:

  • Does not depend on a database driver or 3rd party library.不依赖于数据库驱动程序或第 3 方库。
  • Will not fail on duplicate column names不会因重复的列名而失败
  • Handles commas in the data处理数据中的逗号
  • Handles any delimiter, not just commas (although that is the default)处理任何分隔符,而不仅仅是逗号(尽管这是默认值)

Here's what I came up with:这是我想出的:

  Public Function ToDataTable(FileName As String, Optional Delimiter As String = ",") As DataTable
    ToDataTable = New DataTable
    Using TextFieldParser As New Microsoft.VisualBasic.FileIO.TextFieldParser(FileName) With
      {.HasFieldsEnclosedInQuotes = True, .TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited, .TrimWhiteSpace = True}
      With TextFieldParser
        .SetDelimiters({Delimiter})
        .ReadFields.ToList.Unique.ForEach(Sub(x) ToDataTable.Columns.Add(x))
        ToDataTable.Columns.Cast(Of DataColumn).ToList.ForEach(Sub(x) x.AllowDBNull = True)
        Do Until .EndOfData
          ToDataTable.Rows.Add(.ReadFields.Select(Function(x) Text.BlankToNothing(x)).ToArray)
        Loop
      End With
    End Using
  End Function

It depends on an extension method ( Unique ) to handle duplicate column names to be found as my answer in How to append unique numbers to a list of strings这取决于扩展方法( Unique )来处理重复的列名,作为我在How to append unique numbers to a list of strings 中的答案

And here's the BlankToNothing helper function:这是BlankToNothing辅助函数:

  Public Function BlankToNothing(ByVal Value As String) As Object 
    If String.IsNullOrEmpty(Value) Then Return Nothing
    Return Value
  End Function

With Cinchoo ETL - an open source library, you can easily convert CSV file to DataTable with few lines of code.使用Cinchoo ETL - 一个开源库,您可以使用几行代码轻松地将 CSV 文件转换为 DataTable。

using (var p = new ChoCSVReader(** YOUR CSV FILE **)
     .WithFirstLineHeader()
    )
{
    var dt = p.AsDataTable();
}

For more information, please visit codeproject article.有关更多信息,请访问codeproject文章。

Hope it helps.希望能帮助到你。

I use a library called ExcelDataReader, you can find it on NuGet.我使用了一个叫做 ExcelDataReader 的库,你可以在 NuGet 上找到它。 Be sure to install both ExcelDataReader and the ExcelDataReader.DataSet extension (the latter provides the required AsDataSet method referenced below).确保同时安装 ExcelDataReader 和 ExcelDataReader.DataSet 扩展(后者提供下面引用的所需 AsDataSet 方法)。

I encapsulated everything in one function, you can copy it in your code directly.我将所有内容都封装在一个函数中,您可以直接将其复制到您的代码中。 Give it a path to CSV file, it gets you a dataset with one table.给它一个 CSV 文件的路径,它会为您提供一个包含一张表的数据集。

public static DataSet GetDataSet(string filepath)
{
   var stream = File.OpenRead(filepath);

   try
   {
       var reader = ExcelReaderFactory.CreateCsvReader(stream, new ExcelReaderConfiguration()
       {
           LeaveOpen = false
       });

       var result = reader.AsDataSet(new ExcelDataSetConfiguration()
       {
           // Gets or sets a value indicating whether to set the DataColumn.DataType 
           // property in a second pass.
           UseColumnDataType = true,

           // Gets or sets a callback to determine whether to include the current sheet
           // in the DataSet. Called once per sheet before ConfigureDataTable.
           FilterSheet = (tableReader, sheetIndex) => true,

           // Gets or sets a callback to obtain configuration options for a DataTable. 
           ConfigureDataTable = (tableReader) => new ExcelDataTableConfiguration()
           {
               // Gets or sets a value indicating the prefix of generated column names.
               EmptyColumnNamePrefix = "Column",

               // Gets or sets a value indicating whether to use a row from the 
               // data as column names.
               UseHeaderRow = true,

               // Gets or sets a callback to determine which row is the header row. 
               // Only called when UseHeaderRow = true.
               ReadHeaderRow = (rowReader) =>
               {
                   // F.ex skip the first row and use the 2nd row as column headers:
                   //rowReader.Read();
               },

               // Gets or sets a callback to determine whether to include the 
               // current row in the DataTable.
               FilterRow = (rowReader) =>
               {
                   return true;
               },

               // Gets or sets a callback to determine whether to include the specific
               // column in the DataTable. Called once per column after reading the 
               // headers.
               FilterColumn = (rowReader, columnIndex) =>
               {
                   return true;
               }
           }
       });

       return result;
   }
   catch (Exception ex)
   {
       return null;
   }
   finally
   {
       stream.Close();
       stream.Dispose();
   }
}

I've recently written a CSV parser for .NET that I'm claiming is currently the fastest available as a nuget package: Sylvan.Data.Csv .我最近为 .NET 编写了一个CSV 解析器我声称它是目前最快的 nuget 包: Sylvan.Data.Csv

Using this library to load a DataTable is extremely easy.使用这个库加载DataTable非常容易。

using var dr = CsvDataReader.Create("data.csv");
var dt = new DataTable();
dt.Load(dr);

Assuming your file is a standard comma separated files with headers, that's all you need.假设您的文件是带有标题的标准逗号分隔文件,这就是您所需要的。 There are also options to allow reading files without headers, and using alternate delimiters etc.还有一些选项允许读取没有标题的文件,并使用备用分隔符等。

It is also possible to provide a custom schema for the CSV file so that columns can be treated as something other than string values.还可以为 CSV 文件提供自定义架构,以便可以将列视为string值以外的其他内容。 This will allow the DataTable columns to be loaded with values that can be easier to work with, as you won't have to coerce them when you access them.这将允许DataTable列加载更易于使用的值,因为您在访问它们时不必强制它们。

This can be accomplished by providing an ICsvSchemaProvider implementation, which exposes a single method DbColumn? GetColumn(string? name, int ordinal)这可以通过提供一个 ICsvSchemaProvider 实现来实现,它公开了一个DbColumn? GetColumn(string? name, int ordinal)方法DbColumn? GetColumn(string? name, int ordinal) DbColumn? GetColumn(string? name, int ordinal) . DbColumn? GetColumn(string? name, int ordinal) The DbColumn type is an abstract type defined in System.Data.Common , which means that you would have to provide an implementation of that too if you implement your own schema provider. DbColumn类型是System.Data.Common定义的抽象类型,这意味着如果您实现自己的架构提供程序,您也必须提供它的实现。 The DbColumn type exposes a variety of metadata about a column, and you can choose to expose as much of the metadata as needed. DbColumn 类型公开有关列的各种元数据,您可以根据需要选择公开尽可能多的元数据。 The most important metadata is the DataType and AllowDBNull .最重要的元数据是DataTypeAllowDBNull

A very simple implementation that would expose type information could look like the following:公开类型信息的非常简单的实现可能如下所示:

class TypedCsvColumn : DbColumn
{
    public TypedCsvColumn(Type type, bool allowNull)
    {
        // if you assign ColumnName here, it will override whatever is in the csv header
        this.DataType = type;
        this.AllowDBNull = allowNull;
    }
}
    
class TypedCsvSchema : ICsvSchemaProvider
{
    List<TypedCsvColumn> columns;

    public TypedCsvSchema()
    {
        this.columns = new List<TypedCsvColumn>();
    }

    public TypedCsvSchema Add(Type type, bool allowNull = false)
    {
        this.columns.Add(new TypedCsvColumn(type, allowNull));
        return this;
    }

    DbColumn? ICsvSchemaProvider.GetColumn(string? name, int ordinal)
    {
        return ordinal < columns.Count ? columns[ordinal] : null;
    }
}

To consume this implementation you would do the following:要使用此实现,您将执行以下操作:


var schema = new TypedCsvSchema()
    .Add(typeof(int))
    .Add(typeof(string))
    .Add(typeof(double), true)
    .Add(typeof(DateTime))
    .Add(typeof(DateTime), true);
var options = new CsvDataReaderOptions
{
    Schema = schema
};


using var dr = CsvDataReader.Create("data.csv", options);
...

Use this, one function solve all problems of comma and quote:使用这个,一个函数解决逗号和引号的所有问题:

public static DataTable CsvToDataTable(string strFilePath)
    {

        if (File.Exists(strFilePath))
        {

            string[] Lines;
            string CSVFilePathName = strFilePath;

            Lines = File.ReadAllLines(CSVFilePathName);
            while (Lines[0].EndsWith(","))
            {
                Lines[0] = Lines[0].Remove(Lines[0].Length - 1);
            }
            string[] Fields;
            Fields = Lines[0].Split(new char[] { ',' });
            int Cols = Fields.GetLength(0);
            DataTable dt = new DataTable();
            //1st row must be column names; force lower case to ensure matching later on.
            for (int i = 0; i < Cols; i++)
                dt.Columns.Add(Fields[i], typeof(string));
            DataRow Row;
            int rowcount = 0;
            try
            {
                string[] ToBeContinued = new string[]{};
                bool lineToBeContinued = false;
                for (int i = 1; i < Lines.GetLength(0); i++)
                {
                    if (!Lines[i].Equals(""))
                    {
                        Fields = Lines[i].Split(new char[] { ',' });
                        string temp0 = string.Join("", Fields).Replace("\"\"", "");
                        int quaotCount0 = temp0.Count(c => c == '"');
                        if (Fields.GetLength(0) < Cols || lineToBeContinued || quaotCount0 % 2 != 0)
                        {
                            if (ToBeContinued.GetLength(0) > 0)
                            {
                                ToBeContinued[ToBeContinued.Length - 1] += "\n" + Fields[0];
                                Fields = Fields.Skip(1).ToArray();
                            }
                            string[] newArray = new string[ToBeContinued.Length + Fields.Length];
                            Array.Copy(ToBeContinued, newArray, ToBeContinued.Length);
                            Array.Copy(Fields, 0, newArray, ToBeContinued.Length, Fields.Length);
                            ToBeContinued = newArray;
                            string temp = string.Join("", ToBeContinued).Replace("\"\"", "");
                            int quaotCount = temp.Count(c => c == '"');
                            if (ToBeContinued.GetLength(0) >= Cols && quaotCount % 2 == 0 )
                            {
                                Fields = ToBeContinued;
                                ToBeContinued = new string[] { };
                                lineToBeContinued = false;
                            }
                            else
                            {
                                lineToBeContinued = true;
                                continue;
                            }
                        }

                        //modified by Teemo @2016 09 13
                        //handle ',' and '"'
                        //Deserialize CSV following Excel's rule:
                        // 1: If there is commas in a field, quote the field.
                        // 2: Two consecutive quotes indicate a user's quote.

                        List<int> singleLeftquota = new List<int>();
                        List<int> singleRightquota = new List<int>();

                        //combine fileds if number of commas match
                        if (Fields.GetLength(0) > Cols) 
                        {
                            bool lastSingleQuoteIsLeft = true;
                            for (int j = 0; j < Fields.GetLength(0); j++)
                            {
                                bool leftOddquota = false;
                                bool rightOddquota = false;
                                if (Fields[j].StartsWith("\"")) 
                                {
                                    int numberOfConsecutiveQuotes = 0;
                                    foreach (char c in Fields[j]) //start with how many "
                                    {
                                        if (c == '"')
                                        {
                                            numberOfConsecutiveQuotes++;
                                        }
                                        else
                                        {
                                            break;
                                        }
                                    }
                                    if (numberOfConsecutiveQuotes % 2 == 1)//start with odd number of quotes indicate system quote
                                    {
                                        leftOddquota = true;
                                    }
                                }

                                if (Fields[j].EndsWith("\""))
                                {
                                    int numberOfConsecutiveQuotes = 0;
                                    for (int jj = Fields[j].Length - 1; jj >= 0; jj--)
                                    {
                                        if (Fields[j].Substring(jj,1) == "\"") // end with how many "
                                        {
                                            numberOfConsecutiveQuotes++;
                                        }
                                        else
                                        {
                                            break;
                                        }
                                    }

                                    if (numberOfConsecutiveQuotes % 2 == 1)//end with odd number of quotes indicate system quote
                                    {
                                        rightOddquota = true;
                                    }
                                }
                                if (leftOddquota && !rightOddquota)
                                {
                                    singleLeftquota.Add(j);
                                    lastSingleQuoteIsLeft = true;
                                }
                                else if (!leftOddquota && rightOddquota)
                                {
                                    singleRightquota.Add(j);
                                    lastSingleQuoteIsLeft = false;
                                }
                                else if (Fields[j] == "\"") //only one quota in a field
                                {
                                    if (lastSingleQuoteIsLeft)
                                    {
                                        singleRightquota.Add(j);
                                    }
                                    else
                                    {
                                        singleLeftquota.Add(j);
                                    }
                                }
                            }
                            if (singleLeftquota.Count == singleRightquota.Count)
                            {
                                int insideCommas = 0;
                                for (int indexN = 0; indexN < singleLeftquota.Count; indexN++)
                                {
                                    insideCommas += singleRightquota[indexN] - singleLeftquota[indexN];
                                }
                                if (Fields.GetLength(0) - Cols >= insideCommas) //probabaly matched
                                {
                                    int validFildsCount = insideCommas + Cols; //(Fields.GetLength(0) - insideCommas) may be exceed the Cols
                                    String[] temp = new String[validFildsCount];
                                    int totalOffSet = 0;
                                    for (int iii = 0; iii < validFildsCount - totalOffSet; iii++)
                                    {
                                        bool combine = false;
                                        int storedIndex = 0;
                                        for (int iInLeft = 0; iInLeft < singleLeftquota.Count; iInLeft++)
                                        {
                                            if (iii + totalOffSet == singleLeftquota[iInLeft])
                                            {
                                                combine = true;
                                                storedIndex = iInLeft;
                                                break;
                                            }
                                        }
                                        if (combine)
                                        {
                                            int offset = singleRightquota[storedIndex] - singleLeftquota[storedIndex];
                                            for (int combineI = 0; combineI <= offset; combineI++)
                                            {
                                                temp[iii] += Fields[iii + totalOffSet + combineI] + ",";
                                            }
                                            temp[iii] = temp[iii].Remove(temp[iii].Length - 1, 1);
                                            totalOffSet += offset;
                                        }
                                        else
                                        {
                                            temp[iii] = Fields[iii + totalOffSet];
                                        }
                                    }
                                    Fields = temp;
                                }
                            }
                        }
                        Row = dt.NewRow();
                        for (int f = 0; f < Cols; f++)
                        {
                            Fields[f] = Fields[f].Replace("\"\"", "\""); //Two consecutive quotes indicate a user's quote
                            if (Fields[f].StartsWith("\""))
                            {
                                if (Fields[f].EndsWith("\""))
                                {
                                    Fields[f] = Fields[f].Remove(0, 1);
                                    if (Fields[f].Length > 0)
                                    {
                                        Fields[f] = Fields[f].Remove(Fields[f].Length - 1, 1);
                                    }
                                }
                            }
                            Row[f] = Fields[f];
                        }
                        dt.Rows.Add(Row);
                        rowcount++;
                    }
                }
            }
            catch (Exception ex)
            {
                throw new Exception( "row: " + (rowcount+2) + ", " + ex.Message);
            }
            //OleDbConnection connection = new OleDbConnection(string.Format(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0}; Extended Properties=""text;HDR=Yes;FMT=Delimited"";", FilePath + FileName));
            //OleDbCommand command = new OleDbCommand("SELECT * FROM " + FileName, connection);
            //OleDbDataAdapter adapter = new OleDbDataAdapter(command);
            //DataTable dt = new DataTable();
            //adapter.Fill(dt);
            //adapter.Dispose();
            return dt;
        }
        else
            return null;

        //OleDbConnection connection = new OleDbConnection(string.Format(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0}; Extended Properties=""text;HDR=Yes;FMT=Delimited"";", strFilePath));
        //OleDbCommand command = new OleDbCommand("SELECT * FROM " + strFileName, connection);
        //OleDbDataAdapter adapter = new OleDbDataAdapter(command);
        //DataTable dt = new DataTable();
        //adapter.Fill(dt);
        //return dt;
    }

Just sharing this extension methods, I hope that it can help someone.只是分享这个扩展方法,我希望它可以帮助某人。

public static List<string> ToCSV(this DataSet ds, char separator = '|')
{
    List<string> lResult = new List<string>();

    foreach (DataTable dt in ds.Tables)
    {
        StringBuilder sb = new StringBuilder();
        IEnumerable<string> columnNames = dt.Columns.Cast<DataColumn>().
                                          Select(column => column.ColumnName);
        sb.AppendLine(string.Join(separator.ToString(), columnNames));

        foreach (DataRow row in dt.Rows)
        {
            IEnumerable<string> fields = row.ItemArray.Select(field =>
              string.Concat("\"", field.ToString().Replace("\"", "\"\""), "\""));
            sb.AppendLine(string.Join(separator.ToString(), fields));
        }

        lResult.Add(sb.ToString());
    }
    return lResult;
}

public static DataSet CSVtoDataSet(this List<string> collectionCSV, char separator = '|')
{
    var ds = new DataSet();

    foreach (var csv in collectionCSV)
    {
        var dt = new DataTable();

        var readHeader = false;
        foreach (var line in csv.Split(new[] { Environment.NewLine }, StringSplitOptions.None))
        {
            if (!readHeader)
            {
                foreach (var c in line.Split(separator))
                    dt.Columns.Add(c);
            }
            else
            {
                dt.Rows.Add(line.Split(separator));
            }
        }

        ds.Tables.Add(dt);
    }

    return ds;
}
 Public Function ReadCsvFileToDataTable(strFilePath As String) As DataTable
    Dim dtCsv As DataTable = New DataTable()
    Dim Fulltext As String
    Using sr As StreamReader = New StreamReader(strFilePath)
        While Not sr.EndOfStream
            Fulltext = sr.ReadToEnd().ToString()
            Dim rows As String() = Fulltext.Split(vbLf)
            For i As Integer = 0 To rows.Count() - 1 - 1
                Dim rowValues As String() = rows(i).Split(","c)
                If True Then
                    If i = 0 Then
                        For j As Integer = 0 To rowValues.Count() - 1
                            dtCsv.Columns.Add(rowValues(j))
                        Next
                    Else
                        Dim dr As DataRow = dtCsv.NewRow()
                        For k As Integer = 0 To rowValues.Count() - 1
                            dr(k) = rowValues(k).ToString()
                        Next
                        dtCsv.Rows.Add(dr)
                    End If
                End If
            Next
        End While
    End Using
    Return dtCsv
End Function

Converter csv to DataTable.将 csv 转换为 DataTable。 You can choose separator, isFirstRowHeaders and prefix for extra headers if your first row is not full list of headers, or you automaticle generate headers.如果您的第一行不是完整的标题列表,或者您自动生成标题,您可以选择分隔符、isFirstRowHeaders 和额外标题的前缀。

        public DataTable GetDataFromCsv(string path, char separator, bool isFirstRowHeaders = true,  string prefixAutoHeader = "AutoHeader_")
    {
        DataTable dt = new DataTable();
        string csvData;
        try
        {
            using (StreamReader sr = new StreamReader(path))
            {
                csvData = sr.ReadToEnd().ToString();

                //Split csvData by Rows
                List<string> csvRows = new List<string>(csvData.Split('\n'));

                //Split rows by cells with selected separator
                List<List<string>> csvCells = new List<List<string>>();
                csvRows.ForEach(r => csvCells.Add(new List<string>(r.Split(separator))));

                //definition row max size, for adding extra headers
                int maxSizeRow = csvCells.OrderByDescending(r => r.Count).First().Count;

                //if isFirstRowHeaders then filling datatable headers from first csvRow  
                if (isFirstRowHeaders)
                {
                    foreach (string header in csvCells[0])
                    {
                        dt.Columns.Add(header);
                    }
                }

                //Adding extra headers in datatable or create AutoHeaders if isFirstRowHeaders == false
                for (int i = dt.Columns.Count; i < maxSizeRow; i++)
                {
                    dt.Columns.Add(prefixAutoHeader + i);
                }

                //Filling datatable
                foreach (var row in csvCells)
                {
                    //Skip the first row if it is consist headers
                    if (isFirstRowHeaders)
                    {
                        isFirstRowHeaders = false;
                    }
                    else
                    {
                        //creating datatable row and Add to datatable
                        int i = 0;
                        DataRow toInsert = dt.NewRow();
                        foreach (string cell in row)
                        {
                            try
                            {
                                toInsert[i] = cell;
                            }
                            catch (Exception ex) { }
                            i++;
                        }
                        dt.Rows.Add(toInsert);
                    }
                }
            }
            return dt;
        }
        catch (Exception e)
        {
            return null;
        }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM