简体   繁体   中英

Entity Framework: how to improve bulk update performance?

I have some code that does some calculations and based on that it updates a column in one table with a new value. It's fast in the beginning, but over time it takes longer and longer (performance seems to degrade exponentially overtime)

Is there a way to improve performance? By manually specifying what needs to be updated or similar?

(So far my best way to tackle this issue was to create a stored procedure that works as a bulk update, but I'm wondering if there is a native way of doing this in Entity Framework)

My code is something like:

public void UpdateValues()
{
    var itemsPerBag = _dbContext.Items
                                .Where(i => i.needsToBeRecalculated)
                                .GroupBy(i => BagId)

    foreach (bag in itemsPerBag)
    {
        CalculateValue(bag); 
    }

    _dbContext.SaveChanges()
}

public void CalculateValue(IEnumerable<Item> bag)
{
    foreach (item in bag)
    {
        item.calculatedValue = CalculateValue(Item);
    }
}

It is not literally this, but I'm doing my updated per "group", not doing them one by one, to try to not make a commit too big neither too small.

I have around 850 "bags"/saves and 25000 items. and this is taking 1min to do 11000 updates and 4min to do the 25000 updates.

I think this is a rather small amount of data that should be done much quicker, the calculations I'm doing are very simple.

EDIT:

The only way I've managed to improve performance from 4minutes to 20 seconds was to create a stored procedure in the database to update the data, and call it instead of the SaveChanges() .

private async Task UpdatePlanItems(IEnumerable<Item> items)
{
   SqlParameter param = new SqlParameter();
   param.ParameterName = "@Items";
   param.SqlDbType = SqlDbType.Structured;
   param.Value = GetItemsTable(items);
   param.TypeName = "dbo.ItemUpdateType";

   await _databaseStatement.ExecuteAsync("EXEC dbo.usp_UpdateItemValue {0}", param);
 }

 private DataTable GetItemsTable(IEnumerable<Item> items)
 {
    var table = new DataTable();
    table.Columns.Add("ItemId", typeof(int));
    table.Columns.Add("Value", typeof(int));

    foreach (var item in items)
    {
       var row = table.NewRow();
       row["ItemId"] = item.ItemId;
       row["Value"] = item.Value;
       table.Rows.Add(row);
     }

     return table;
  }

On the database I had to run this:

CREATE TYPE [dbo].[ItemUpdateType] AS TABLE(
              [ItemId] [int] NULL,
              [Value] [int] NULL
)
GO

CREATE PROCEDURE [dbo].[usp_UpdateItemValue]
    (@PlanItems [dbo].ItemUpdateType READONLY) 
AS
BEGIN
    UPDATE p
    SET i.Value = s.Value
    FROM [dbo].[Item] i
    INNER JOIN @PlanItems s ON s.PlanItemId = i.PlanItemId
END

You should not save changes after every change and rather do it after doing all the changes (or at least in batches of 100/1000/...) so your code should look like this. Otherwise you're making n db calls (1 per item) instead of just 1 (for all items)

public void UpdateValues(){
  var itemsPerBag = _dbContext.Items.Where(i => i.needsToBeRecalculated)
                                 .GroupBy(i => BagId)

  foreach (bag in itemsPerBag){
    CalculateValue(bag);    
  }

  _dbContext.SaveChanges()
}

Also having too many changes without commiting to DB (not going in batches) may still be slow especially if you have a lot of dataitems in modified state and you may want to disable automatic change detection as it always checks every modified item and rather do the change detection manually at the end. You may also need to re-enable automatic change detection if you're sharing the DbContext instance (which you should not)

// Turn off automatic change detection
_dbContext.Configuration.AutoDetectChangesEnabled = false;

// All your operations (calculation/updating/adding items/...)
AllYourUpdatesToItems();

// Manually call detect changes so EF's SaveChanges() actually commits something
_dbContext.ChangeTracker.DetectChanges();
_dbContext.SaveChanges();

Just try this. You don't need to call save changes after recalculating each item. It would be enough to call once, after all recalculating is done.

var itemsPerBag = _dbContext.Items.Where(i => i.needsToBeRecalculated)
                                 .GroupBy(i => i.BagId).ToArray();

foreach (bag in itemsPerBag) {CalculateValue(bag);}

  _dbContext.SaveChanges()

I have developed a project which benchmarks the bulk insert method in EF and other ORMs in.Net.

Check it out Github Link

In this project, I have implemented bulk insert in different ways and elapsed time was calculated for every technique.

4 different techniques for bulk insert by Entity framework and without it:

1- EFCore.BulkExtensions => https://github.com/borisdj/EFCore.BulkExtensions

2- Bulk-Operations => https://github.com/zzzprojects/Bulk-Operations

3- EF core AddRange => https://github.com/do.net/efcore

4- Microsoft SqlBulkCopy => https://learn.microsoft.com/en-us/do.net/api/system.data.sqlclient.sqlbulkcopy?view=do.net-plat-ext-5.0

EF7 now supports bulk update and delete. See the details here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM