简体   繁体   English

EF的高性能插入和复制控件

[英]High-performance inserts and duplicates control with EF

I know the basics of SQL and so I have learned to create Insert Queries like the following one: 我了解SQL的基础知识,因此我学会了创建类似于以下内容的插入查询:

queryAccount.AppendLine(
    string.Format(
        "Insert INTO Account(Number_Account, DoID, ClientID) Select "
            + "{0}, "
            + "(Select id From AccountDO Where Number_do = {1}), "
            + "(Select id From Client Where Number_Client = {2})"
            + "Where not exists(Select * From Account Where Number_Account = {0});",
        item.Client.NumberAccount,
        item.Client.NumberDo,
        item.Client.NumberClient));

In that query I add data to a table "Account" that has two FKs ( DoID and ClientID ), and I also check if that account already exists. 在该查询中,我将数据添加到具有两个FK( DoIDClientID )的表“ Account”中,并且还检查该帐户是否已经存在。 Usually, to insert data from a flat file I use a String Builder in order to create multiple insert queries. 通常,要从平面文件插入数据,我使用String Builder来创建多个插入查询。

This works well in some projects with low requisites, but now I have a bigger challenge in hands. 在某些要求较低的项目中,此方法效果很好,但现在我面临着更大的挑战。 I need to create a web site that imports new data on a daily basis, and so it's important to have the "import module" following the best practices. 我需要创建一个每天导入新数据的网站,因此,遵循最佳实践,具有“导入模块”很重要。

What I have done so far: 到目前为止,我所做的是:

  • My database model: around 20 tables with all kinds of relationships (many-to-many included); 我的数据库模型:大约20种具有各种关系的表(包括多对多);
  • Used the Entity Model Code Generator to generate the corresponded model in the APS .Net project. 使用实体模型代码生成器在APS .Net项目中生成对应的模型。

What I need to do: 我需要做什么:

  • Perform a bulk data insertion, the way I'm doing it is slow and probably not safe; 执行大容量数据插入,我这样做的方式很慢,而且可能不安全;
  • Insert that data avoiding duplicates and with the proper IDs of foreign tables; 插入该数据,避免重复,并使用正确的外部表ID;

And that is why I need your help, how can I make the best use of the technologies available in order to achieve my goals? 这就是为什么我需要您的帮助,我如何才能充分利用可用的技术来实现自己的目标? Is it possible to use the Entity Framework (EF) in order to add a List of Accounts into the DataSet with little effort?. 是否可以使用实体框架(EF)来轻松地将帐户列表添加到数据集中?

Your requirements: 您的要求:

  1. Perform a bulk data insertion, fast and safe. 快速,安全地执行批量数据插入。
  2. Insert data avoiding duplicates with the proper IDs of foreign tables. 插入数据,避免重复使用正确的外表ID。
  3. Make best use of the technologies available. 充分利用可用技术。
  4. Use Entity Framework (EF) to add a List of Accounts into the DataSet with little effort? 使用实体框架(EF)可以毫不费力地将帐户列表添加到数据集中吗?

If you are using EF to insert data from your C# code, you can consider using parameterized sql queries to make your inserts safer from SQL injection attacks. 如果使用EF从C#代码中插入数据,则可以考虑使用参数化的sql查询,以使插入内容更安全,免受SQL注入攻击。

Using Data.SqlClient.SqlCommand.Parameters.Add : 使用Data.SqlClient.SqlCommand.Parameters.Add

MSDN: SqlCommand.Parameters Property MSDN: SqlCommand.Parameters属性

public void InsertCustomer(Integer customerID, DateTime activityDate) {
    String sql = "INSERT INTO Customers (customerID, ActivityDate) VALUES (@customerID, @activityDate);";
    Data.SqlClient.SqlCommand cmd = new Data.SqlClient.SqlCommand(sql);
    cmd.CommandType = CommandType.Text;
    cmd.Parameters.Add("@customerID ", Data.SqlDbType.Int).Value = customerID;
    cmd.Parameters.Add("@activityDate ", Data.SqlDbType.DateTime).Value = activityDate;
    try {
        using (SqlConnection connection = new Data.SqlClient.SqlConnection(YourConnectionString)) {
            connection.Open();
            cmd.Connection = connection;
            cmd.ExecuteNonQuery();
        }
    } catch (Exception ex) {
        throw ex;
    }
}

However your insert jobs should run faster if you use SSIS or T-SQL BULK INSERTS. 但是,如果使用SSIS或T-SQL BULK INSERTS,则插入作业应运行得更快。

Here are the resources I found: 这是我找到的资源:

Insert and Update Records with an SSIS ETL Package 使用SSIS ETL包插入和更新记录

  • SQL Server Integration Services (SSIS) SQL Server集成服务(SSIS)
  • Extract Transform Load (ETL) 提取变换负载(ETL)

Bulk Import and Export of Data (SQL Server) 批量导入和导出数据(SQL Server)

  • (cmd line) bcp utility (cmd行)bcp实用程序
  • (T-SQL) BULK INSERT (T-SQL)批量插入
  • (T-SQL) INSERT ... SELECT * FROM OPENROWSET(BULK...) (T-SQL)INSERT ... SELECT * FROM OPENROWSET(BULK ...)

Optimizing Bulk Import Performance 优化批量导入性能

  • Using minimal logging 使用最少的日志记录
  • Importing data in parallel from multiple clients to a single table 从多个客户端并行将数据导入单个表
  • Using batches 使用批次
  • Disabling triggers 禁用触发器
  • Disabling constraints 禁用约束
  • Ordering the data in a data file 排序数据文件中的数据
  • Controlling the locking behavior 控制锁定行为
  • Importing data in native format 以本机格式导入数据

Prerequisites for Minimal Logging in Bulk Import 批量导入时最少记录的前提条件

Minimal logging requires that the target table meets the following conditions: 最少的日志记录要求目标表满足以下条件:

  • The table is not being replicated. 该表未复制。
  • Table locking is specified (using TABLOCK). 指定了表锁定(使用TABLOCK)。
  • Table is not a memory-optimized table. 表不是内存优化表。

Whether minimal logging can occur for a table also depends on whether the table is indexed and, if so, whether the table is empty: 一个表是否可以进行最少的日志记录还取决于该表是否已建立索引,如果是,则还取决于该表是否为空:

  • If the table has no indexes, data pages are minimally logged. 如果表中没有索引,则最少记录数据页。
  • If the table has no clustered index but has one or more nonclustered indexes, data pages are always minimally logged. 如果表没有聚簇索引,但有一个或多个非聚簇索引,则数据页总是最小记录。 How index pages are logged, however, depends on whether the table is empty: 但是,如何记录索引页取决于表是否为空:
    • If the table is empty, index pages are minimally logged. 如果表为空,则最少记录索引页。
    • If table is non-empty, index pages are fully logged. 如果表是非空的,则索引页已完全记录。
    • If the table has a clustered index and is empty, both data and index pages are minimally logged. 如果表具有聚集索引并且为空,则最少记录数据页和索引页。 In contrast, if a table has a clustered index and is non-empty, data pages and index pages are both fully logged regardless of the recovery model. 相反,如果表具有聚集索引且非空,则无论恢复模式如何,数据页和索引页都将被完全记录。

Bulk Inserts via TSQL in SQL Server 在SQL Server中通过TSQL批量插入

  • BULK INSERT - SQL Server 2005 & 2008 批量插入-SQL Server 2005和2008
  • INSERT…SELECT * FROM OPENROWSET(BULK…) - SQL Server 2005 & 2008 INSERT…SELECT * FROM OPENROWSET(BULK…)-SQL Server 2005和2008

Assuming you have properly mapped your database (either code-first or database-first), you should have a handful of tables mapped to your context. 假设您已正确映射数据库(代码优先或数据库优先),则应将少数几个表映射到您的上下文。 For example: 例如:

public class DataModel : DbContext
{
    /* more code ... */

    public virtual DbSet<User> Users { get; set; }

    /* more code ... */
}

The DbSet class exposes an AddRange method you can use for bulk inserts. DbSet类公开了可用于批量插入的AddRange方法。 So, assuming you had a collection of User objects, you could do this: 因此,假设您有一组User对象,则可以执行以下操作:

public class SomeClass
{
    public int InsertUsers(params User[] users)
    {
        using(var context = new DataModel())
        {
            context.Users.AddRange(users);
        }
    }
}

The users will be inserted in one transaction (assuming the underlying datastore supports transactions). 将用户插入到一个事务中(假设基础数据存储支持事务)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM