简体   繁体   English

EF Codefirst批量插入

[英]EF Codefirst Bulk Insert

I need to insert around 2500 rows using EF Code First. 我需要使用EF Code First插入大约2500行。

My original code looked something like this: 我的原始代码看起来像这样:

foreach(var item in listOfItemsToBeAdded)
{
    //biz logic
    context.MyStuff.Add(i);
}

This took a very long time. 这花了很长时间。 It was around 2.2 seconds for each DBSet.Add() call, which equates to around 90 minutes. 每个DBSet.Add()调用约为2.2秒,相当于大约90分钟。

I refactored the code to this: 我重构了这个代码:

var tempItemList = new List<MyStuff>();
foreach(var item in listOfItemsToBeAdded)
{
    //biz logic
    tempItemList.Add(item)
}
context.MyStuff.ToList().AddRange(tempItemList);

This only takes around 4 seconds to run. 这只需要大约4秒钟才能运行。 However, the .ToList() queries all the items currently in the table, which is extremely necessary and could be dangerous or even more time consuming in the long run. 但是, .ToList()查询当前表中的所有项目,这是非常必要的,从长远来看可能是危险的,甚至更耗时。 One workaround would be to do something like context.MyStuff.Where(x=>x.ID = *empty guid*).AddRange(tempItemList) because then I know there will never be anything returned. 一个解决方法是做类似context.MyStuff.Where(x=>x.ID = *empty guid*).AddRange(tempItemList)因为我知道永远不会有任何返回。

But I'm curious if anyone else knows of an efficient way to to a bulk insert using EF Code First? 但我很好奇是否有其他人知道使用EF Code First进行批量插入的有效方法?

Validation is normally a very expensive portion of EF, I had great performance improvements by disabling it with: 验证通常是EF的一个非常昂贵的部分,通过以下方式禁用它,我获得了很大的性能提升:

context.Configuration.AutoDetectChangesEnabled = false;
context.Configuration.ValidateOnSaveEnabled = false;

I believe I found that in a similar SO question-- perhaps it was this answer 我相信我发现在一个类似的问题中 - 也许就是这个答案

Another answer on that question rightly points out that if you really need bulk insert performance you should look at using System.Data.SqlClient.SqlBulkCopy . 关于该问题的另一个答案正确地指出,如果你真的需要批量插入性能,你应该看看使用System.Data.SqlClient.SqlBulkCopy The choice between EF and ADO.NET for this issue really revolves around your priorities. EF和ADO.NET之间针对此问题的选择实际上围绕着您的优先事项。

I have a crazy idea but I think it will help you. 我有一个疯狂的想法,但我认为它会对你有所帮助。

After each adding 100 items call SaveChanges. 每次添加100项后,请调用SaveChanges。 I have a feeling Track Changes in EF have a very bad performance with huge data. 我有一种感觉跟踪EF的变化与巨大的数据有很差的表现。

I would recommend this article on how to do bulk inserts using EF. 我建议这篇文章介绍如何使用EF进行批量插入。

Entity Framework and slow bulk INSERTs 实体框架和缓慢的批量INSERT

He explores these areas and compares perfomance: 他探索了这些领域,并对性能进行了比较:

  1. Default EF (57 minutes to complete adding 30,000 records) 默认EF(完成添加30,000条记录需要57分钟)
  2. Replacing with ADO.NET Code (25 seconds for those same 30,000) 用ADO.NET代码替换(对于那些相同的30,000代码,则为25
  3. Context Bloat- Keep the active Context Graph small by using a new context for each Unit of Work (same 30,000 inserts take 33 seconds) 上下文Bloat-通过为每个工作单元使用新的上下文来保持活动的Context Graph较小(相同的30,000个插入需要33秒)
  4. Large Lists - Turn off AutoDetectChangesEnabled (brings the time down to about 20 seconds) 大列表 - 关闭AutoDetectChangesEnabled(将时间缩短到大约20秒)
  5. Batching (down to 16 seconds) 批量(低至16秒)
  6. DbTable.AddRange() - (performance is in the 12 range) DbTable.AddRange() - (性能在12范围内)

EF is not really usable for batch/bulk operations (I think in general ORMs are not). EF不适用于批量/批量操作(我认为通常ORM不是)。

The particular reason why this is running so slowly is because of the change tracker in EF. 这种运行速度如此之慢的特殊原因是由于EF中的变化跟踪器。 Virtually every call to the EF API results in a call to TrackChanges() internally, including DbSet.Add(). 实际上,每次调用EF API都会在内部调用TrackChanges(),包括DbSet.Add()。 When you add 2500, this function gets called 2500 times. 添加2500时,此函数将被调用2500次。 And each call gets slower and slower, the more data you have added. 每次调用都越来越慢,添加的数据越多。 So disabling the change tracking in EF should help a lot: 因此,禁用EF中的更改跟踪应该会有很多帮助:

dataContext.Configuration.AutoDetectChangesEnabled = false;

A better solution would be to split your big bulk operation into 2500 smaller transactions, each running with their own data context. 更好的解决方案是将大批量操作拆分为2500个较小的事务,每个事务都使用自己的数据上下文运行。 You could use msmq, or some other mechanism for reliable messaging, for initiating each of these smaller transactions. 您可以使用msmq或其他一些可靠消息传递机制来启动每个较小的事务。

But if your system is build around a lot a bulk operations, I would suggest finding a different solution for your data access layer than EF. 但是,如果您的系统是大量的批量操作,我建议为您的数据访问层寻找一个不同于EF的解决方案。

As STW pointed out, the DetectChanges method called every time you call the Add method is VERY expensive. 正如STW指出的那样,每次调用Add方法时调用的DetectChanges方法都非常昂贵。

Common solution are: 常见解决方案是:

  • Use AddRange over Add 在Add上使用AddRange
  • SET AutoDetectChanges to false 将AutoDetectChanges设置为false
  • SPLIT SaveChanges in multiple batches 多个SPLIT SaveChanges

See: Improve Entity Framework Add Performance 请参阅: 改进实体框架添加性能

It's important to note that using AddRange doesn't perform a BulkInsert, it's simply invoke the DetecthChanges method once (after all entities is added) which greatly improve the performance. 重要的是要注意,使用AddRange不会执行BulkInsert,它只需调用DetecthChanges方法一次(在添加所有实体之后),这将极大地提高性能。

But I'm curious if anyone else knows of an efficient way to to a bulk insert using EF Code First 但我很好奇是否有其他人知道使用EF Code First进行批量插入的有效方法

There is some third party library supporting Bulk Insert available: 有一些支持批量插入的第三方库可用:

See: Entity Framework Bulk Insert library 请参阅: 实体框架批量插入库


Disclaimer : I'm the owner of Entity Framework Extensions 免责声明 :我是Entity Framework Extensions的所有者

This library allows you to perform all bulk operations you need for your scenarios: 该库允许您执行场景所需的所有批量操作:

  • Bulk SaveChanges 批量SaveChanges
  • Bulk Insert 批量插入
  • Bulk Delete 批量删除
  • Bulk Update 批量更新
  • Bulk Merge 批量合并

Example

// Easy to use
context.BulkSaveChanges();

// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);

// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);

// Customize Primary Key
context.BulkMerge(customers, operation => {
   operation.ColumnPrimaryKeyExpression = 
        customer => customer.Code;
});

Although a late reply, but I'm posting the answer because I suffered the same pain. 虽然迟到了回复,但我发布的答案是因为我遭受了同样的痛苦。 I've created a new GitHub project just for that, as of now, it supports Bulk insert/update/delete for Sql server transparently using SqlBulkCopy. 我刚刚为此创建了一个新的GitHub项目,截至目前,它支持使用SqlBulkCopy透明地为Sql服务器批量插入/更新/删除。

https://github.com/MHanafy/EntityExtensions https://github.com/MHanafy/EntityExtensions

There're other goodies as well, and hopefully, It will be extended to do more down the track. 还有其他的好东西,希望它会延伸到更多的轨道上。

Using it is as simple as 使用它很简单

var insertsAndupdates = new List<object>();
var deletes = new List<object>();
context.BulkUpdate(insertsAndupdates, deletes);

Hope it helps! 希望能帮助到你!

EF6 beta 1 has an AddRange function that may suit your purpose: EF6 beta 1具有适合您目的的AddRange功能:

INSERTing many rows with Entity Framework 6 beta 1 使用Entity Framework 6 beta 1插入许多行

EF6 will be released "this year" (2013) EF6将于今年发布 (2013年)

While this is a bit late and the answers and comments posted above are very useful, I will just leave this here and hope it proves useful for people who had the same problem as I did and come to this post for answers. 虽然这有点晚了,上面发布的答案和评论非常有用,但我会把它留在这里,希望它对那些遇到同样问题的人有用,并且可以在这篇文章中找到答案。 This post still ranks high on Google (at the time of posting this answer) if you search for a way to bulk-insert records using Entity Framework. 如果您搜索使用Entity Framework批量插入记录的方法,此帖子仍然在Google上排名很高(在发布此答案时)。

I had a similar problem using Entity Framework and Code First in an MVC 5 application. 我在MVC 5应用程序中使用Entity Framework和Code First遇到了类似的问题。 I had a user submit a form that caused tens of thousands of records to be inserted into a table. 我让用户提交了一个表单,该表单将数万条记录插入到表中。 The user had to wait for more than 2 and a half minutes while 60,000 records were being inserted. 用户必须等待超过2.5分钟,同时插入60,000条记录。

After much googling, I stumbled upon BulkInsert-EF6 which is also available as a NuGet package. 经过大量的谷歌搜索后,我偶然发现了BulkInsert-EF6 ,它也可以作为NuGet包使用。 Reworking the OP's code: 重写OP的代码:

var tempItemList = new List<MyStuff>();
foreach(var item in listOfItemsToBeAdded)
{
    //biz logic
    tempItemList.Add(item)
}

using (var transaction = context.Transaction())
{
    try
    {
        context.BulkInsert(tempItemList);
        transaction.Commit();
    }
    catch (Exception ex)
    {
        // Handle exception
        transaction.Rollback();
    }
}

My code went from taking >2 minutes to <1 second for 60,000 records. 对于60,000条记录,我的代码从> 2分钟到<1秒。

    public static void BulkInsert(IList list, string tableName)
    {
        var conn = (SqlConnection)Db.Connection;
        if (conn.State != ConnectionState.Open) conn.Open();

        using (var bulkCopy = new SqlBulkCopy(conn))
        {
            bulkCopy.BatchSize = list.Count;
            bulkCopy.DestinationTableName = tableName;

            var table = ListToDataTable(list);
            bulkCopy.WriteToServer(table);
        }
    }

    public static DataTable ListToDataTable(IList list)
    {
        var dt = new DataTable();
        if (list.Count <= 0) return dt;

        var properties = list[0].GetType().GetProperties();
        foreach (var pi in properties)
        {
            dt.Columns.Add(pi.Name, Nullable.GetUnderlyingType(pi.PropertyType) ?? pi.PropertyType);
        }

        foreach (var item in list)
        {
            DataRow row = dt.NewRow();
            properties.ToList().ForEach(p => row[p.Name] = p.GetValue(item, null) ?? DBNull.Value);
            dt.Rows.Add(row);
        }
        return dt;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM