简体   繁体   中英

Populate database table from CSV

I am trying to populate a datatable reading values from a csv file. The format of the datatable should match a corresponding table in a database.

The csv file has very many columns (~80) so I don't want to type out everything. The names of the columns in the csv file don't match the names of the columns in the db exactly. Also, two additional columns, with data not present in the csv have to be added manually.

The problem is to convert the string data from the csv-file to the correct type in the datatable.

Currently I have

  1. I read the table template from the database and use this to create my new datatable.

  2. I create a map that maps the column positions from csv file to the column positions in the database.

  3. I try to insert the value from the csv file into the datatable. This is where my code fails, because the data is of the incorrect type. As stated above, since there are so many different columns, I dont want to do the conversion manually but rather infer the type from the table template. Also, some columns can contain null values.

My code

public static DataTable ReadAssets(string strFilePath, DateTime reportingDate, Enums.ReportingBases reportingBasis, char sep=',')
{
    //Reads the table template from the database
    DataTable dt = DbInterface.Db.GetTableTemplate("reports.Assets");

    var dbColumnNames = dt.Columns.Cast<DataColumn>().Select(x => x.ColumnName).ToList();

    //These columns are not present in the csv data and so they have to be added manually
    int posReportingDate = dbColumnNames.IndexOf("ReportingDate");
    int posReportingBasis = dbColumnNames.IndexOf("ReportingBasis");

    //read the csv and populate the table
    using (StreamReader sr = new (strFilePath))
    {
        string[] csvColumnNames = sr.ReadLine().Split(sep);

        //creates an <int, int> dictionary that maps the columns
        var columnMap = CreateColumnMap(dbColumnNames.ToArray(), csvColumnNames);
            
        while (!sr.EndOfStream)
        {
            string[] csvRow = sr.ReadLine().Split(sep);
            DataRow dr = dt.NewRow();

            dr[posReportingDate] = reportingDate;
            dr[posReportingBasis] = reportingBasis.ToString();

            foreach(var posPair in columnMap)
            {
                //This is where the code fails.... I need a conversion to the correct type here.
                dr[posPair.Value] = csvRow[posPair.Key];
            }

            dt.Rows.Add(dr);
        }
    }
    return dt;
}

I maintain a couple libraries that can help with this scenario: Sylvan.Data and Sylvan.Data.Csv . They are both open-source, MIT licensed, and available on nuget.org. My library allows applying a schema to CSV data, and attaching extra columns. Doing this allows using the SqlBulkCopy to efficiently load the data directly into the database. My CSV parser also happens to be the fastest in the .NET ecosystem .

As an example, given the following target SQL table:

create table MyTable (
Name varchar(32),
Value int,
ValueDate datetime,
InsertDate datetime,
RowNum int
)

A CSV file, data.csv , containing the following:

a,b,c
a,1,2022-01-01
b,2,2022-01-02

Here is a complete C# 6 sample program that will bulk copy CSV data along with "extra" columns into a data base table.

using Sylvan.Data; // v1.1.0
using Sylvan.Data.Csv; // v0.1.1
using System.Data.SqlClient;

const string SourceCsvFile = "data.csv";
const string TargetTableName = "MyTable";

var conn = new SqlConnection();
conn.ConnectionString = new SqlConnectionStringBuilder
{
    DataSource = ".",
    InitialCatalog = "Test",
    IntegratedSecurity = true
}.ConnectionString;
conn.Open();

// read schema for the target table
var cmd = conn.CreateCommand();
cmd.CommandText = $"select top 0 * from {TargetTableName}";

var reader = cmd.ExecuteReader();
var schema = reader.GetColumnSchema();
reader.Close();

// apply the database schema to the CSV data
var opts = new CsvDataReaderOptions { Schema = new CsvSchema(schema) };
var csvReader = CsvDataReader.Create(SourceCsvFile, opts);

// attach additional external columns to the CSV data
var data = csvReader.WithColumns(
    new CustomDataColumn<DateTime>("ImportDate", r => DateTime.UtcNow),
    new CustomDataColumn<int>("RowNum", r => csvReader.RowNumber)
);

// bulk copy the data into the target table
var bc = new SqlBulkCopy(conn);
bc.DestinationTableName = TargetTableName;
bc.WriteToServer(data);

Hopefully you find this to be an elegant solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM