简体   繁体   中英

High-performance inserts and duplicates control with EF

I know the basics of SQL and so I have learned to create Insert Queries like the following one:

queryAccount.AppendLine(
    string.Format(
        "Insert INTO Account(Number_Account, DoID, ClientID) Select "
            + "{0}, "
            + "(Select id From AccountDO Where Number_do = {1}), "
            + "(Select id From Client Where Number_Client = {2})"
            + "Where not exists(Select * From Account Where Number_Account = {0});",
        item.Client.NumberAccount,
        item.Client.NumberDo,
        item.Client.NumberClient));

In that query I add data to a table "Account" that has two FKs ( DoID and ClientID ), and I also check if that account already exists. Usually, to insert data from a flat file I use a String Builder in order to create multiple insert queries.

This works well in some projects with low requisites, but now I have a bigger challenge in hands. I need to create a web site that imports new data on a daily basis, and so it's important to have the "import module" following the best practices.

What I have done so far:

  • My database model: around 20 tables with all kinds of relationships (many-to-many included);
  • Used the Entity Model Code Generator to generate the corresponded model in the APS .Net project.

What I need to do:

  • Perform a bulk data insertion, the way I'm doing it is slow and probably not safe;
  • Insert that data avoiding duplicates and with the proper IDs of foreign tables;

And that is why I need your help, how can I make the best use of the technologies available in order to achieve my goals? Is it possible to use the Entity Framework (EF) in order to add a List of Accounts into the DataSet with little effort?.

Your requirements:

  1. Perform a bulk data insertion, fast and safe.
  2. Insert data avoiding duplicates with the proper IDs of foreign tables.
  3. Make best use of the technologies available.
  4. Use Entity Framework (EF) to add a List of Accounts into the DataSet with little effort?

If you are using EF to insert data from your C# code, you can consider using parameterized sql queries to make your inserts safer from SQL injection attacks.

Using Data.SqlClient.SqlCommand.Parameters.Add :

MSDN: SqlCommand.Parameters Property

public void InsertCustomer(Integer customerID, DateTime activityDate) {
    String sql = "INSERT INTO Customers (customerID, ActivityDate) VALUES (@customerID, @activityDate);";
    Data.SqlClient.SqlCommand cmd = new Data.SqlClient.SqlCommand(sql);
    cmd.CommandType = CommandType.Text;
    cmd.Parameters.Add("@customerID ", Data.SqlDbType.Int).Value = customerID;
    cmd.Parameters.Add("@activityDate ", Data.SqlDbType.DateTime).Value = activityDate;
    try {
        using (SqlConnection connection = new Data.SqlClient.SqlConnection(YourConnectionString)) {
            connection.Open();
            cmd.Connection = connection;
            cmd.ExecuteNonQuery();
        }
    } catch (Exception ex) {
        throw ex;
    }
}

However your insert jobs should run faster if you use SSIS or T-SQL BULK INSERTS.

Here are the resources I found:

Insert and Update Records with an SSIS ETL Package

  • SQL Server Integration Services (SSIS)
  • Extract Transform Load (ETL)

Bulk Import and Export of Data (SQL Server)

  • (cmd line) bcp utility
  • (T-SQL) BULK INSERT
  • (T-SQL) INSERT ... SELECT * FROM OPENROWSET(BULK...)

Optimizing Bulk Import Performance

  • Using minimal logging
  • Importing data in parallel from multiple clients to a single table
  • Using batches
  • Disabling triggers
  • Disabling constraints
  • Ordering the data in a data file
  • Controlling the locking behavior
  • Importing data in native format

Prerequisites for Minimal Logging in Bulk Import

Minimal logging requires that the target table meets the following conditions:

  • The table is not being replicated.
  • Table locking is specified (using TABLOCK).
  • Table is not a memory-optimized table.

Whether minimal logging can occur for a table also depends on whether the table is indexed and, if so, whether the table is empty:

  • If the table has no indexes, data pages are minimally logged.
  • If the table has no clustered index but has one or more nonclustered indexes, data pages are always minimally logged. How index pages are logged, however, depends on whether the table is empty:
    • If the table is empty, index pages are minimally logged.
    • If table is non-empty, index pages are fully logged.
    • If the table has a clustered index and is empty, both data and index pages are minimally logged. In contrast, if a table has a clustered index and is non-empty, data pages and index pages are both fully logged regardless of the recovery model.

Bulk Inserts via TSQL in SQL Server

  • BULK INSERT - SQL Server 2005 & 2008
  • INSERT…SELECT * FROM OPENROWSET(BULK…) - SQL Server 2005 & 2008

Assuming you have properly mapped your database (either code-first or database-first), you should have a handful of tables mapped to your context. For example:

public class DataModel : DbContext
{
    /* more code ... */

    public virtual DbSet<User> Users { get; set; }

    /* more code ... */
}

The DbSet class exposes an AddRange method you can use for bulk inserts. So, assuming you had a collection of User objects, you could do this:

public class SomeClass
{
    public int InsertUsers(params User[] users)
    {
        using(var context = new DataModel())
        {
            context.Users.AddRange(users);
        }
    }
}

The users will be inserted in one transaction (assuming the underlying datastore supports transactions).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM