高效的SQL Server存储过程

Question

I am using SQL Server 2008 and running the following stored procedure that needs to "clean" a 70 mill table from about 50 mill rows to another table, the id_col is integer (primary identity key) 我正在使用SQL Server 2008并运行以下存储过程，需要将70 id_col表从大约50个行“清理”到另一个表， id_col是integer （主标识键）

According to the last running I made it is working good but it is expected to last for about 200 days: 根据我最后一次运行，它使它运行良好，但预计将持续约200天：

SET NOCOUNT ON

    -- define the last ID handled
    DECLARE @LastID integer
    SET @LastID = 0
    declare @tempDate datetime
    set @tempDate = dateadd(dd,-20,getdate())
    -- define the ID to be handled now
    DECLARE @IDToHandle integer
    DECLARE @iCounter integer
    DECLARE @watch1 nvarchar(50)
    DECLARE @watch2 nvarchar(50)
    set @iCounter = 0
    -- select the next  to handle    
    SELECT TOP 1 @IDToHandle = id_col
    FROM MAIN_TABLE
    WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
        and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
    ORDER BY id_col

    -- as long as we have s......    
    WHILE @IDToHandle IS NOT NULL
    BEGIN
        IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = @IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =@IDToHandle )=0)
        BEGIN
            INSERT INTO SECONDERY_TABLE
            SELECT col1,col2,col3.....
            FROM MAIN_TABLE WHERE id_col = @IDToHandle

            EXEC    [dbo].[DeleteByID] @ID = @IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
            set @iCounter = @iCounter +1
        END
        IF (@iCounter % 1000 = 0)
        begin
            set @watch1 = 'iCounter - ' + CAST(@iCounter AS VARCHAR)
            set @watch2 = 'IDToHandle - '+ CAST(@IDToHandle AS VARCHAR)
            raiserror ( @watch1, 10,1) with nowait
            raiserror (@watch2, 10,1) with nowait
        end
        -- set the last  handled to the one we just handled
        SET @LastID = @IDToHandle
        SET @IDToHandle = NULL

        -- select the next  to handle    
        SELECT TOP 1 @IDToHandle = id_col
        FROM MAIN_TABLE
        WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
            and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
        ORDER BY id_col
    END

Any ideas or directions to improve this procedure run-time will be welcomed 任何改进此程序运行时的想法或指示都将受到欢迎

Answer 1

Yes, try this: 是的，试试这个：

Declare @Ids Table (id int Primary Key not Null)
Insert @Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
    And someDateCol < @tempDate -- If there are times in these datetime fields, 
                                -- then you may need to modify this condition.
    And some_other_int_col In (1745, 1548, 4785)
    And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
                    Where some_int_col = m.id_col)
    And Not Exists (Select * From A_70k_rows_table
                    Where some_int_col = m.id_col)
Select id from @Ids  -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified @Ids is correct
-- Once you have verified that above @Ids is correctly populated, 
-- then delete or comment out the select and return lines above so insert runs.

      Begin Transaction
      Delete OT     -- eliminate row-by-row call to second stored proc
      From OtherTable ot
         Join MAIN_TABLE m On m.id_col = ot.FKCol
         Join @Ids i On i.Id = m.id_col 

      Insert SECONDERY_TABLE(col1, col2, etc.)
      Select col1,col2,col3.....
      FROM MAIN_TABLE m Join @Ids i On i.Id = m.id_col 

      Delete m   -- eliminate row-by-row call to second stored proc
      FROM MAIN_TABLE m 
      Join @Ids i On i.Id = m.id_col 

      Commit Transaction

Explaanation. Explaanation。

You had numerous filtering conditions that were not SARGable, ie, they would force a complete table scan for every iteration of your loop, instead of being able to use any existing index. 您有许多不是SARGable的过滤条件，即它们会强制对循环的每次迭代进行完整的表扫描，而不是能够使用任何现有索引。 Always try to avoid filter conditions that apply processing logic to a table column value before comparing it to some other value. 始终尝试避免将处理逻辑应用于表列值的过滤条件，然后再将其与其他值进行比较。 This eliminates the opportunity for the query optimizer to use an index. 这消除了查询优化器使用索引的机会。
You were executing the inserts one at a time... Way better to generate a list of PK Ids that need to be processed (all at once) and then do all the inserts at once, in one statement. 您正在一次执行一个插入...更好地生成需要处理的PK ID列表（一次全部），然后在一个语句中一次执行所有插入。

高效的SQL Server存储过程

问题描述

1 个解决方案

解决方案1
3 2013-03-20 18:38:47

高效的SQL Server存储过程

问题描述

1 个解决方案

解决方案1 3 2013-03-20 18:38:47

解决方案1
3 2013-03-20 18:38:47