简体繁体 English

C# 实体框架 - 更新 500,000 多条记录中的列值

[英]C# Entity Framework - Update column value in 500,000+ records

原文 2021-12-10 16:40:12 6 2 c#/ .net/ entity-framework

We need to process 500,000 records in a database by adding a certain value for a specific column in each record.我们需要通过为每条记录中的特定列添加特定值来处理数据库中的 500,000 条记录。

Currently, we are running multiple Tasks in parallel using TPL, each taking the records in batches (of size 1000) update the values, and writing them back to the database using a DBContext.目前，我们正在使用 TPL 并行运行多个任务，每个任务都分批获取记录（大小为 1000）更新值，并使用 DBContext 将它们写回数据库。 This takes around 10 minutes to process.这需要大约 10 分钟的时间来处理。

Are there more efficient ways to process large databases?有没有更有效的方法来处理大型数据库？

EDIT - the value that we update with is generate dynamically, depending on the record information编辑- 我们更新的值是动态生成的，具体取决于记录信息

2 个解决方案

Are there more efficient ways to process large databases?是否有更有效的方法来处理大型数据库？

Run a SQL statement to change all of the data at once.运行 SQL 语句以一次更改所有数据。 Don't feel like you have to use entities for every DB update - there's still nothing wrong with running SQL scripts on the back-end database directly.不要觉得每次数据库更新都必须使用实体 - 直接在后端数据库上运行 SQL 脚本仍然没有错。 There are methods within EF to run custom SQL, or you could have a separate "support" app that does not use EF but manages the data directly. EF 中有一些方法可以运行自定义 SQL，或者您可以有一个单独的“支持”应用程序，它不使用 EF 但直接管理数据。

If you are unable to use T-SQL directly then change your approach to produce the T-SQL needed to run it directly.如果您无法直接使用 T-SQL，则更改您的方法以生成直接运行它所需的 T-SQL 。 If the values must be calculated beforehand and are different for each record this will be a much faster approach than trying to use Entity Framework on such a large dataset.如果值必须预先计算并且每个记录都不同，这将是比尝试在如此大的数据集上使用实体框架更快的方法。

Architecture and design of your codebase plays a key role here.代码库的架构和设计在这里起着关键作用。 These types of problems are why we cleanly separate the domain logic from data access logic, so the processing and computation of business rules and values does not interfere with how you need to persist them and visa-versa.这些类型的问题是我们将域逻辑与数据访问逻辑完全分开的原因，因此业务规则和值的处理和计算不会干扰您需要如何持久化它们，反之亦然。

For example, if you had 500,000 business entity classes that you retrieve from a repository class and compute all the values for them, you could simply then enumerate all of them and produce the desired SQL or pass the new values and identities to your data access layer to perform an optimized bulk update.例如，如果您有 500,000 个业务实体类，您从存储库 class 中检索它们并计算它们的所有值，那么您可以简单地枚举所有它们并生成所需的 SQL 或将新值和身份传递给数据访问层执行优化的批量更新。

The reason I did not provide code in this answer is because there are many ways to develop a solution to this problem using my suggested approach .我没有在这个答案中提供代码的原因是因为有很多方法可以使用我建议的方法来开发解决这个问题的方法。

It is important to still understand that Entity Framework was still designed around the unit of work concept (at least EF6) and is not optimized for bulk workloads (except select query scenarios).仍然需要了解实体框架仍然围绕工作单元概念（至少 EF6）设计，并且没有针对批量工作负载进行优化（select 查询场景除外）。 A well-designed codebase will definitely have a mix of EF and T-SQL in the data access layer (or database via functions and stored procedures) to handle performance critical operations.一个设计良好的代码库肯定会在数据访问层（或通过函数和存储过程的数据库）中混合使用 EF 和 T-SQL 来处理性能关键操作。