简体   繁体   中英

Efficient SQL Server stored procedure

I am using SQL Server 2008 and running the following stored procedure that needs to "clean" a 70 mill table from about 50 mill rows to another table, the id_col is integer (primary identity key)

According to the last running I made it is working good but it is expected to last for about 200 days:

SET NOCOUNT ON

    -- define the last ID handled
    DECLARE @LastID integer
    SET @LastID = 0
    declare @tempDate datetime
    set @tempDate = dateadd(dd,-20,getdate())
    -- define the ID to be handled now
    DECLARE @IDToHandle integer
    DECLARE @iCounter integer
    DECLARE @watch1 nvarchar(50)
    DECLARE @watch2 nvarchar(50)
    set @iCounter = 0
    -- select the next  to handle    
    SELECT TOP 1 @IDToHandle = id_col
    FROM MAIN_TABLE
    WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
        and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
    ORDER BY id_col

    -- as long as we have s......    
    WHILE @IDToHandle IS NOT NULL
    BEGIN
        IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = @IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =@IDToHandle )=0)
        BEGIN
            INSERT INTO SECONDERY_TABLE
            SELECT col1,col2,col3.....
            FROM MAIN_TABLE WHERE id_col = @IDToHandle

            EXEC    [dbo].[DeleteByID] @ID = @IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
            set @iCounter = @iCounter +1
        END
        IF (@iCounter % 1000 = 0)
        begin
            set @watch1 = 'iCounter - ' + CAST(@iCounter AS VARCHAR)
            set @watch2 = 'IDToHandle - '+ CAST(@IDToHandle AS VARCHAR)
            raiserror ( @watch1, 10,1) with nowait
            raiserror (@watch2, 10,1) with nowait
        end
        -- set the last  handled to the one we just handled
        SET @LastID = @IDToHandle
        SET @IDToHandle = NULL

        -- select the next  to handle    
        SELECT TOP 1 @IDToHandle = id_col
        FROM MAIN_TABLE
        WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
            and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
        ORDER BY id_col
    END

Any ideas or directions to improve this procedure run-time will be welcomed

Yes, try this:

Declare @Ids Table (id int Primary Key not Null)
Insert @Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
    And someDateCol < @tempDate -- If there are times in these datetime fields, 
                                -- then you may need to modify this condition.
    And some_other_int_col In (1745, 1548, 4785)
    And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
                    Where some_int_col = m.id_col)
    And Not Exists (Select * From A_70k_rows_table
                    Where some_int_col = m.id_col)
Select id from @Ids  -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified @Ids is correct
-- Once you have verified that above @Ids is correctly populated, 
-- then delete or comment out the select and return lines above so insert runs.

      Begin Transaction
      Delete OT     -- eliminate row-by-row call to second stored proc
      From OtherTable ot
         Join MAIN_TABLE m On m.id_col = ot.FKCol
         Join @Ids i On i.Id = m.id_col 

      Insert SECONDERY_TABLE(col1, col2, etc.)
      Select col1,col2,col3.....
      FROM MAIN_TABLE m Join @Ids i On i.Id = m.id_col 

      Delete m   -- eliminate row-by-row call to second stored proc
      FROM MAIN_TABLE m 
      Join @Ids i On i.Id = m.id_col 

      Commit Transaction

Explaanation.

  1. You had numerous filtering conditions that were not SARGable, ie, they would force a complete table scan for every iteration of your loop, instead of being able to use any existing index. Always try to avoid filter conditions that apply processing logic to a table column value before comparing it to some other value. This eliminates the opportunity for the query optimizer to use an index.

  2. You were executing the inserts one at a time... Way better to generate a list of PK Ids that need to be processed (all at once) and then do all the inserts at once, in one statement.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM