简体   繁体   English

批量更新4000万行的最佳方法

[英]Best way to update 40 million rows in batch

Basically I need to run this on a table with 40 million rows, updating every row at once will crash, so I want to batch the query so that if it crash, it can re-run the query and it would skip the finished batch and just continue with the ones left over. 基本上我需要在一个有4000万行的表上运行它,一次更新每一行都会崩溃,所以我想批量查询,这样如果它崩溃,它可以重新运行查询,它会跳过完成的批处理继续留下剩下的。

UPDATE [table]
SET [New_ID] = [Old_ID]

What is the fastest way to do this? 最快的方法是什么? Here is how the table is created: 以下是创建表的方式:

CREATE TABLE [table](
    [INSTANCE_ID] [int] NOT NULL,
    [table_ID] [bigint] IDENTITY(1,1) NOT NULL,
    [old_ID] [bigint] NOT NULL,
    [new_ID] [bigint] NOT NULL,
    [owner_ID] [int] NOT NULL,
    [created_time] [datetime] NULL
) ON [PRIMARY]

There are also indexes on created_time, owner_ID. created_time,owner_ID上还有索引。

EDIT: My update statement is EXACTLY as shown, I literally just need to copy every entry in old_id into new_id for 40 million rows. 编辑:我的更新声明完全如图所示,我只需要将old_id中的每个条目复制到new_id中以获得4000万行。

Declare @Rowcount INT = 1;

WHILE (@Rowcount > 0)   
BEGIN
        UPDATE TOP (100000) [table]   --<-- define Batch Size in TOP Clause
           SET [New_ID] = [Old_ID]
        WHERE [New_ID] <> [Old_ID]

        SET @Rowcount = @@ROWCOUNT;

       CHECKPOINT;   --<-- to commit the changes with each batch
END

M.Ali's suggestion will work, but you will end up with degrading performance as you work through the 40M records. M.Ali的建议会起作用,但是当您处理40M记录时,最终会降低性能。 I would suggest a better filter to find the records to update in each pass. 我会建议一个更好的过滤器来查找每次传递中要更新的记录。 This would assume you have a primary key (or other index) on your identity column: 这将假设您的标识列上有一个主键(或其他索引):

DECLARE @Rowcount INT = 1
    ,   @BatchSize INT = 100000
    ,   @StartingRecord BIGINT = 1;

WHILE (@Rowcount > 0)   
BEGIN
    UPDATE [table]
        SET [New_ID] = [Old_ID]
    WHERE [table_ID] BETWEEN @StartingRecord AND @StartingRecord + @BatchSize - 1;

    SET @Rowcount = @@ROWCOUNT;

    CHECKPOINT;

    SELECT @StartingRecord += @BatchSize
END

This approach will allow each iteration to be as fast as the first. 这种方法将允许每次迭代与第一次迭代一样快。 And if you don't have a valid index you need to fix that first. 如果您没有有效的索引,则需要先修复它。

Select 1;  -- this will set a rowcount
WHILE (@@Rowcount > 0)   
BEGIN
  UPDATE TOP (1000000) [table]   
    SET [New_ID] =  [Old_ID]
  WHERE [New_ID] <> [Old_ID] 
    or ([New_ID] is null and [Old_ID] is not null)
END

100000 may work better for the top. 100000可能更适合顶部。

Since NewID and OldID is not null then the is null check is not necessary. 由于NewID和OldID不为null,因此不需要进行null检查。

Fastest way is to : 最快的方法是:

1) Create a temp table and insert all the values from old to temp table using the create(select having condition) statement. 1)创建临时表,并使用create(select having condition)语句将旧表中的所有值插入到临时表中。

2) Copy the constraints and refresh the indexes. 2)复制约束并刷新索引。

3) Drop the old table. 3)放下旧桌子。

4) Rename temp table to original name. 4)将临时表重命名为原始名称。

Complete discussion is available on this link 链接提供完整的讨论

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM