简体   繁体   中英

Best approach for inserting millions of rows into a SQL Server database

I am gathering data from multiple feeds including api's, excel files, text files, word files. I am using a relational database to store all relationships. There are up to 10 one to many or many to many relationships.

The approach I am using is writing each entry into a .csv file then calling a stored procedure to bulk insert all of the entries. So in this case I can have 10 separate files for each table in my database.

There are 2 problems I ran into:

  • Transferring the files over to the database server (same network)
  • Primary keys, I need to use guid instead of auto increment

What is the best approach for performance?

2 words: BULK INSERT

if you already have a csv file, this is simply a case of writing some SQL or C# (which ever you prefer) to execute a bulk insert.

Here are the SQL docs: https://msdn.microsoft.com/en-gb/library/ms188365.aspx

BULK INSERT MySchema.MyTable
FROM 'c:\myfile.csv'
WITH 
  (
     FIELDTERMINATOR =',',
     ROWTERMINATOR ='\n'
  );

And the C# docs: https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy%28v=vs.110%29.aspx

I've built a small tool for that https://github.com/MikaelEliasson/EntityFramework.Utilities#batch-insert-entities or the Nuget link https://www.nuget.org/packages/EFUtilities/

It will use SqlBulkCopy from your in memory lists. It uses EF metadata so you don't have to configure that yourself. The code looks like this:

using (var ctx = new Context())
{
    EFBatchOperation.For(ctx, ctx.Locations).InsertAll(locations);
}

This is from a small demo I made https://github.com/MikaelEliasson/EFUtilitiesDemos/blob/master/BulkInsertAndUpdate/Program.cs#L46

The speed depends a lot on how many bits your entities are. My tests shows I can insert ~100 000 objects/s for medium sized entities.

If you have guids the relational insert should be fairly easy to do like you already did.

Because you have multiple inserts I suggest you use a transaction scope. See https://github.com/MikaelEliasson/EntityFramework.Utilities/issues/26

EDIT

If you prefer to use int or longs that will be included in the next release. It will take a bit longer but you can enable Id return for store generated ids.

See: https://github.com/MikaelEliasson/EntityFramework.Utilities/blob/release20/EntityFramework.Utilities/Tests/InsertTests.cs#L125

That code is working now but the release is not ready. But you could download and build the realease20 branch yourself if you want to try it now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM