简体   繁体   中英

how to import large CSV file to SQL server using .net core API

How to import large CSV file to SQL server using.Net Core API (C#)

I have tried with below code, it's working for small amount of data but how can I import large amount of data (around 1000000 records in csv file) in chunks.

public void LoadFile()
    {

        string filePath = @"F:\Test\Book1.csv";
        using (var reader = new StreamReader(filePath))
        using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
        {
            var records = csv.GetRecords<SalesRecord>();
            foreach (var record in records)
            {

                SalesRecord _salesRecord = new SalesRecord();

                _salesRecord.Region = record.Region;
                _salesRecord.Country = record.Country;
                _salesRecord.ItemType = record.ItemType;
                _salesRecord.SalesChannel = record.SalesChannel;
                _salesRecord.OrderPriority = record.OrderPriority;
                _salesRecord.OrderDate = record.OrderDate;
                _salesRecord.OrderID = record.OrderID;
                _salesRecord.ShipDate = record.ShipDate;
                _salesRecord.UnitsSold = record.UnitsSold;
                _salesRecord.UnitPrice = record.UnitPrice;
                _salesRecord.UnitCost = record.UnitCost;
                _salesRecord.TotalRevenue = record.TotalRevenue;
                _salesRecord.TotalCost = record.TotalCost;
                _salesRecord.TotalProfit = record.TotalProfit;


                _context.SalesRecords.Add(_salesRecord);
            }
             _context.SaveChanges();
        }

   

Neither EF Core nor any other ORM are meant for bulk imports. They're meant to give the impression of working with in-memory objects. In an ETL/Bulk import job there are no objects to begin with, except perhaps Row, Field, Transformation. To scale, an ETL job should load any more data than necessary. The caching, querying and mapping features of an ORM are just overhead in this case, adding large delays. Managing transactions and batch size, one of the most important considerations in ETL, is hard and can only be configured indirectly.

The fastest way to import data from a client is to use the SqlBulkCopy . This class uses the same protocol as BCP or BULK INSERT to insert data with minimal logging - that means only data pages are logged, not individual INSERTs.

SqlBulkCopy accepts only a DataTable or IDbDataReader. Luckily, CsvHelper provides a CsvDataReader interface, so one could copy rows from the CSV to the database directly:

using (var reader = new StreamReader("path\\to\\file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    // Do any configuration to `CsvReader` before creating CsvDataReader.
    using (var dr = new CsvDataReader(csv))
    using (var con=new SqlConnection(connectionString))
    using (var bcp=new SqlBulkCopy(con))
    {
        bcp.DestinationTableName = "dbo.BulkCopyDemoMatchingColumns";
        con.Open();

        bcp.WriteToServer(reader);
    }
}

This matches source and target columns by position and expects the data types to match. This will have to be done in CsvReader's configuration.

In most cases, the source and target column names have to be mapped either by position or by name, through the ColumnMappings collection:

bcp.ColumnMappings("SourceA","TargetA");
bcp.ColumnMappings("SourceB","TargetB");

SqlBulkCopy can be configured to read data in a stream by setting EnableStreaming to false. By default, it will cache all rows in memory before sending them to the server. The batch size can be configured through the BatchSize property.

SqlBulkCopy can use one transaction per batch or one transaction for the entire operation. The way batches and transactions interact is explained in Transactions and Bulk Copy operations .

The SqlBulkCopyOptions parameter in the constructor can be used to specify other important settings that correspond to BULK INSERT options eg

  • Whether to lock the target table, increasing performance
  • Whether to fire triggers
  • Whether to check constraints
  • Preserve identity column values etc.

Importing any IEnumerable<>

Any IEnumerable<T> collection can be used with SqlBulkCopy by using [FastMember's](https://github.com/mgravell/fast-member#ever-needed-an-idatareader) ObjectReader` wrapper:

IEnumerable<SomeType> data = ... 

using(var bcp = new SqlBulkCopy(connection)) 
using(var reader = ObjectReader.Create(data)) 
{ 
  bcp.DestinationTableName = "SomeTable"; 
  bcp.WriteToServer(reader); 
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM