简体   繁体   中英

EF Codefirst Bulk Insert

I need to insert around 2500 rows using EF Code First.

My original code looked something like this:

foreach(var item in listOfItemsToBeAdded)
{
    //biz logic
    context.MyStuff.Add(i);
}

This took a very long time. It was around 2.2 seconds for each DBSet.Add() call, which equates to around 90 minutes.

I refactored the code to this:

var tempItemList = new List<MyStuff>();
foreach(var item in listOfItemsToBeAdded)
{
    //biz logic
    tempItemList.Add(item)
}
context.MyStuff.ToList().AddRange(tempItemList);

This only takes around 4 seconds to run. However, the .ToList() queries all the items currently in the table, which is extremely necessary and could be dangerous or even more time consuming in the long run. One workaround would be to do something like context.MyStuff.Where(x=>x.ID = *empty guid*).AddRange(tempItemList) because then I know there will never be anything returned.

But I'm curious if anyone else knows of an efficient way to to a bulk insert using EF Code First?

Validation is normally a very expensive portion of EF, I had great performance improvements by disabling it with:

context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;

I believe I found that in a similar SO question-- perhaps it was this answer

Another answer on that question rightly points out that if you really need bulk insert performance you should look at using System.Data.SqlClient.SqlBulkCopy . The choice between EF and ADO.NET for this issue really revolves around your priorities.

I have a crazy idea but I think it will help you.

After each adding 100 items call SaveChanges. I have a feeling Track Changes in EF have a very bad performance with huge data.

I would recommend this article on how to do bulk inserts using EF.

Entity Framework and slow bulk INSERTs

He explores these areas and compares perfomance:

  1. Default EF (57 minutes to complete adding 30,000 records)
  2. Replacing with ADO.NET Code (25 seconds for those same 30,000)
  3. Context Bloat- Keep the active Context Graph small by using a new context for each Unit of Work (same 30,000 inserts take 33 seconds)
  4. Large Lists - Turn off AutoDetectChangesEnabled (brings the time down to about 20 seconds)
  5. Batching (down to 16 seconds)
  6. DbTable.AddRange() - (performance is in the 12 range)

EF is not really usable for batch/bulk operations (I think in general ORMs are not).

The particular reason why this is running so slowly is because of the change tracker in EF. Virtually every call to the EF API results in a call to TrackChanges() internally, including DbSet.Add(). When you add 2500, this function gets called 2500 times. And each call gets slower and slower, the more data you have added. So disabling the change tracking in EF should help a lot:

dataContext.Configuration.AutoDetectChangesEnabled = false;

A better solution would be to split your big bulk operation into 2500 smaller transactions, each running with their own data context. You could use msmq, or some other mechanism for reliable messaging, for initiating each of these smaller transactions.

But if your system is build around a lot a bulk operations, I would suggest finding a different solution for your data access layer than EF.

As STW pointed out, the DetectChanges method called every time you call the Add method is VERY expensive.

Common solution are:

  • Use AddRange over Add
  • SET AutoDetectChanges to false
  • SPLIT SaveChanges in multiple batches

See: Improve Entity Framework Add Performance

It's important to note that using AddRange doesn't perform a BulkInsert, it's simply invoke the DetecthChanges method once (after all entities is added) which greatly improve the performance.

But I'm curious if anyone else knows of an efficient way to to a bulk insert using EF Code First

There is some third party library supporting Bulk Insert available:

See: Entity Framework Bulk Insert library


Disclaimer : I'm the owner of Entity Framework Extensions

This library allows you to perform all bulk operations you need for your scenarios:

  • Bulk SaveChanges
  • Bulk Insert
  • Bulk Delete
  • Bulk Update
  • Bulk Merge

Example

// Easy to use
context.BulkSaveChanges();

// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);

// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);

// Customize Primary Key
context.BulkMerge(customers, operation => {
   operation.ColumnPrimaryKeyExpression = 
        customer => customer.Code;
});

Although a late reply, but I'm posting the answer because I suffered the same pain. I've created a new GitHub project just for that, as of now, it supports Bulk insert/update/delete for Sql server transparently using SqlBulkCopy.

https://github.com/MHanafy/EntityExtensions

There're other goodies as well, and hopefully, It will be extended to do more down the track.

Using it is as simple as

var insertsAndupdates = new List<object>();
var deletes = new List<object>();
context.BulkUpdate(insertsAndupdates, deletes);

Hope it helps!

EF6 beta 1 has an AddRange function that may suit your purpose:

INSERTing many rows with Entity Framework 6 beta 1

EF6 will be released "this year" (2013)

While this is a bit late and the answers and comments posted above are very useful, I will just leave this here and hope it proves useful for people who had the same problem as I did and come to this post for answers. This post still ranks high on Google (at the time of posting this answer) if you search for a way to bulk-insert records using Entity Framework.

I had a similar problem using Entity Framework and Code First in an MVC 5 application. I had a user submit a form that caused tens of thousands of records to be inserted into a table. The user had to wait for more than 2 and a half minutes while 60,000 records were being inserted.

After much googling, I stumbled upon BulkInsert-EF6 which is also available as a NuGet package. Reworking the OP's code:

var tempItemList = new List<MyStuff>();
foreach(var item in listOfItemsToBeAdded)
{
    //biz logic
    tempItemList.Add(item)
}

using (var transaction = context.Transaction())
{
    try
    {
        context.BulkInsert(tempItemList);
        transaction.Commit();
    }
    catch (Exception ex)
    {
        // Handle exception
        transaction.Rollback();
    }
}

My code went from taking >2 minutes to <1 second for 60,000 records.

    public static void BulkInsert(IList list, string tableName)
    {
        var conn = (SqlConnection)Db.Connection;
        if (conn.State != ConnectionState.Open) conn.Open();

        using (var bulkCopy = new SqlBulkCopy(conn))
        {
            bulkCopy.BatchSize = list.Count;
            bulkCopy.DestinationTableName = tableName;

            var table = ListToDataTable(list);
            bulkCopy.WriteToServer(table);
        }
    }

    public static DataTable ListToDataTable(IList list)
    {
        var dt = new DataTable();
        if (list.Count <= 0) return dt;

        var properties = list[0].GetType().GetProperties();
        foreach (var pi in properties)
        {
            dt.Columns.Add(pi.Name, Nullable.GetUnderlyingType(pi.PropertyType) ?? pi.PropertyType);
        }

        foreach (var item in list)
        {
            DataRow row = dt.NewRow();
            properties.ToList().ForEach(p => row[p.Name] = p.GetValue(item, null) ?? DBNull.Value);
            dt.Rows.Add(row);
        }
        return dt;
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM