EF Core 多线程死锁 + BeginTransaction + Commit

Question

I have some questions regarding how SaveChangesAsync() and BeginTransaction() + transaction.Commit() work.我对SaveChangesAsync()和BeginTransaction() + transaction.Commit()的工作方式有一些疑问。

My team has a .NET Core worker that receives events from Microsoft EventHub and saves data into SQL server via EF Core 3.我的团队有一个 .NET Core worker，它从 Microsoft EventHub 接收事件并通过 EF Core 3 将数据保存到 SQL 服务器中。
One of event types has a lot of data, so we created a few tables, separate data and then save it into these tables.其中一种事件类型有很多数据，所以我们创建了几个表，将数据分开然后保存到这些表中。 The child tables reference the parent table's id column (FK_Key).子表引用父表的id列 (FK_Key)。
Some data in the DB has to be deleted before new data is saved under certain conditions, so we delete -> upsert data.在某些条件下保存新数据之前必须删除数据库中的某些数据，因此我们删除 -> 更新数据。

To save data into the DB, we call dbContext.Database.BeginTransaction() and transaction.Commit() .要将数据保存到数据库中，我们调用dbContext.Database.BeginTransaction()和transaction.Commit() 。 When we run the worker, we get deadlock exception like Transaction (Process ID 71) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.当我们运行 worker 时，我们会遇到死锁异常，例如Transaction (Process ID 71) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. Transaction (Process ID 71) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

I found that one of .BatchDeleteAsync() in PurgeDataInChildTables() or one of BulkInsertOrUpdateAsync() in Upsert() throws a deadlock exception (it changes every time I run the worker).我发现.BatchDeleteAsync()中的PurgeDataInChildTables()之一或Upsert()中的BulkInsertOrUpdateAsync()之一会引发死锁异常（每次运行工作程序时都会发生变化）。

Here is the code:这是代码：

public async Task DeleteAndUpsert(List<MyEntity> entitiesToDelete, List<MyEntity> entitiesToUpsert)
{
    if (entitiesToDelete.Any())
        await myRepository.Delete(entitiesToDelete);

    if (entitiesToUpsert.Any())
        await myRepository.Upsert(entitiesToUpsert);
}


public override async Task Upsert(IList<MyEntity> entities)
{
    using (var dbContext = new MyDbContext(DbContextOptions, DbOptions))
    {
        using (var transaction = dbContext.Database.BeginTransaction())
        {
            await PurgeDataInChildTables(entities, dbContext);
            await dbContext.BulkInsertOrUpdateAsync(entities);
            // tables that depends on the parent table (FK_Key)
            await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany<Child1>(x => x.Id).ToList());
            await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany<Child2>(x => x.Id).ToList());
            await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany<Child3>(x => x.Id).ToList());
            transaction.Commit();
        }
    }
}

public override async Task Delete(IList<MyEntity> entities)
{
    using (var dbContext = new MyDbContext(DbContextOptions, DbOptions))
    {
        using (var transaction = dbContext.Database.BeginTransaction())
        {
            await PurgeDataInChildTables(entities, dbContext);
            await dbContext.BulkDeleteAsync(entities);
            transaction.Commit();
        }
    }
}

private async Task PurgeDataInChildTables(IList<MyEntity> entities, MyDbContext dbContext)
{
    var ids = entities.Select(x => x.Id).ToList();

    await dbContext.Child1.Where(x => ids.Contains(x.Id)).BatchDeleteAsync();
    await dbContext.Child2.Where(x => ids.Contains(x.Id)).BatchDeleteAsync();
    await dbContext.Child3.Where(x => ids.Contains(x.Id)).BatchDeleteAsync();
}

When the worker starts up, it creates four threads and they all upsert to the same table (and delete too).当 worker 启动时，它会创建四个线程，并且它们都更新插入到同一个表中（并且也删除了）。 So, I assume that deadlock occurs when one thread starts a transaction and another starts another transaction (or something similar..) and then try to upsert to (or delete from) child tables.因此，我假设当一个线程启动一个事务而另一个线程启动另一个事务（或类似的东西..）然后尝试更新插入（或从中删除）子表时会发生死锁。
I tried some stuff to solve the issue and noticed that deadlock seems to be resolved when I remove BeginTransaction() and use SaveChangesAsync() instead.我尝试了一些方法来解决这个问题，并注意到当我删除BeginTransaction()并改用SaveChangesAsync()时，死锁似乎得到了解决。

Here is the modified code:这是修改后的代码：

public override async Task Upsert(IList<MyEntity> entities)
{
    using (var dbContext = new MyDbContext(DbContextOptions, DbOptions))
    {
        await PurgeDataInChildTables(entities, dbContext);
        await dbContext.BulkInsertOrUpdateAsync(entities);
        // tables that depends on the parent table (FK_Key)
        await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany(x => x.Child1).ToList());
        await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany(x => x.Child2).ToList());
        await dbContext.BulkInsertOrUpdateAsync(entities.SelectMany(x => x.Child3).ToList());
        await dbContext.SaveChangesAsync();
    }
}

public override async Task Delete(IList<MyEntity> entities)
{
    using (var dbContext = new MyDbContext(DbContextOptions, DbOptions))
    {
        await PurgeDataInChildTables(entities, dbContext);
        await dbContext.BulkDeleteAsync(entities);
        await dbContext.SaveChangesAsync();
    }
}

Deadlock was occuring ~ 30 seconds after the worker starts up but it didn't occur for 2 ~ 3 mins when I modified the code, so I think the issue is resolved, thought it might still occur if I run the worker longer.死锁发生在工作人员启动后约 30 秒，但当我修改代码时，它在 2 到 3 分钟内没有发生，所以我认为问题已经解决，如果我运行工作人员更长时间，它可能仍然会发生。

Finally, here are my questions:最后，这是我的问题：

When I use BeginTransaction() + .Commit() deadlock occurs but it won't happen when I use SaveChangesAsync() .当我使用BeginTransaction() + .Commit()时会发生死锁，但当我使用SaveChangesAsync()时不会发生。 Why is it?为什么？
What is the difference between the two methods in terms of transaction?这两种方式在交易方面有什么区别？
If the modified code still could cause deadlock or not a good solution, how do I solve it?如果修改后的代码仍然可能导致死锁或不是一个好的解决方案，我该如何解决？

Answer 1

Hard to say precisely without looking into profiling session of the database.如果不查看数据库的分析 session，很难准确地说。 The thing that needs to be looked up there is what kind of locks are taken (where it is shared and where it is exclusive or update ) and when transaction are actually opened.需要查找的内容是采用哪种锁（在哪里shared ，在哪里exclusive或update ）以及何时实际打开事务。 I will describe a theoretical behaviour that needs to be proved with actual database profiling.我将描述一个需要通过实际数据库分析来证明的理论行为。

When you wrap everything with Database.BeginTransaction() :当您使用 Database.BeginTransaction() 包装所有内容时：
Isolation level isn't set by EF, it uses database default isolation level.隔离级别不是由 EF 设置的，它使用数据库默认的隔离级别。 In case of Microsoft SQL Server it will be Read committed .在Microsoft SQL Server的情况下，它将被Read committed 。 This isolation level says that concurrent transactions can read data, but if there is ongoing modification, other transactions will wait for it to complete, even if they want just read.此隔离级别表示并发事务可以读取数据，但如果正在进行修改，其他事务将等待它完成，即使他们只想读取。 Transaction will be held before Commit() is called.事务将在调用Commit()之前进行。

When you don't specify transaction explicitly :当您没有明确指定事务时：
Select statements and SaveChangesAsync will lead to separate transactions with the same isolation level defaulted to database. Select 语句和SaveChangesAsync将导致具有相同隔离级别的单独事务默认为数据库。 Transaction isn't held longer than it needs: in case of, for example, SaveChangesAsync , it will be there while all changes are written, starting from the moment when the method is called.事务的保存时间不会超过它需要的时间：例如，在SaveChangesAsync的情况下，它会在写入所有更改时存在，从调用方法的那一刻开始。

Transaction (Process ID 71) was deadlocked on lock resources with another process and has been chosen as the deadlock victim.事务（进程 ID 71）与另一个进程在锁资源上死锁，并已被选为死锁牺牲品。 Rerun the transaction.重新运行事务。

This message appears when there are several transactions trying to get access to some resource, and one of them is trying to read data and the other one is trying to modify.当有多个事务试图访问某个资源，其中一个正在尝试读取数据而另一个正在尝试修改时，会出现此消息。 In that case, to avoid dead lock, database will try to kill a transaction that will require lower amount of resources to rollback.在这种情况下，为避免死锁，数据库将尝试终止需要较少资源回滚的事务。 In your case — it's a transaction that tries to read.在您的情况下 - 这是一个尝试读取的事务。 Reads are lightwight in terms or weight of rollback.就回滚的权重而言，读取是轻量级的。

Summarizing:总结：
When you have one huge lock that holds one resource a huge amount of time, it stops other workers from accessing that resource as database just kills other workers' transactions when they try to read probably at that var ids = entities.Select(x => x.Id).ToList();当您拥有一个巨大的锁来长时间持有一个资源时，它会阻止其他工作人员访问该资源，因为数据库只会在其他工作人员尝试读取该资源时杀死其他工作人员的事务var ids = entities.Select(x => x.Id).ToList(); point.观点。 When you rewrote your code, you got rid of long locks.当你重写你的代码时，你摆脱了长锁。 More to that, as I can see from documentation to BulkInsertOrUpdateAsync , this extension uses internal transactions on each call, not affecting and not involving EF context.更重要的是，正如我从BulkInsertOrUpdateAsync的文档中看到的那样，此扩展在每次调用时使用内部事务，不影响也不涉及 EF 上下文。 If that so, then it means that actual transactions live even less than one call to SaveChangesAsync when data changed not with the extension, but in usual EF way.如果是这样，那么这意味着当数据不是使用扩展名而是以通常的 EF 方式更改时，实际事务的存活时间甚至少于一次对SaveChangesAsync的调用。

EF Core 多线程死锁 + BeginTransaction + Commit

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-05-08 09:12:31

EF Core 多线程死锁 + BeginTransaction + Commit

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-05-08 09:12:31

解决方案1
2 已采纳 2020-05-08 09:12:31