[英]Efficient SQL Server stored procedure
I am using SQL Server 2008 and running the following stored procedure that needs to "clean" a 70 mill table from about 50 mill rows to another table, the id_col
is integer
(primary identity key) 我正在使用SQL Server 2008并运行以下存储过程,需要将70 id_col
表从大约50个行“清理”到另一个表, id_col
是integer
(主标识键)
According to the last running I made it is working good but it is expected to last for about 200 days: 根据我最后一次运行,它使它运行良好,但预计将持续约200天:
SET NOCOUNT ON
-- define the last ID handled
DECLARE @LastID integer
SET @LastID = 0
declare @tempDate datetime
set @tempDate = dateadd(dd,-20,getdate())
-- define the ID to be handled now
DECLARE @IDToHandle integer
DECLARE @iCounter integer
DECLARE @watch1 nvarchar(50)
DECLARE @watch2 nvarchar(50)
set @iCounter = 0
-- select the next to handle
SELECT TOP 1 @IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
-- as long as we have s......
WHILE @IDToHandle IS NOT NULL
BEGIN
IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = @IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =@IDToHandle )=0)
BEGIN
INSERT INTO SECONDERY_TABLE
SELECT col1,col2,col3.....
FROM MAIN_TABLE WHERE id_col = @IDToHandle
EXEC [dbo].[DeleteByID] @ID = @IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
set @iCounter = @iCounter +1
END
IF (@iCounter % 1000 = 0)
begin
set @watch1 = 'iCounter - ' + CAST(@iCounter AS VARCHAR)
set @watch2 = 'IDToHandle - '+ CAST(@IDToHandle AS VARCHAR)
raiserror ( @watch1, 10,1) with nowait
raiserror (@watch2, 10,1) with nowait
end
-- set the last handled to the one we just handled
SET @LastID = @IDToHandle
SET @IDToHandle = NULL
-- select the next to handle
SELECT TOP 1 @IDToHandle = id_col
FROM MAIN_TABLE
WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
ORDER BY id_col
END
Any ideas or directions to improve this procedure run-time will be welcomed 任何改进此程序运行时的想法或指示都将受到欢迎
Yes, try this: 是的,试试这个:
Declare @Ids Table (id int Primary Key not Null)
Insert @Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
And someDateCol < @tempDate -- If there are times in these datetime fields,
-- then you may need to modify this condition.
And some_other_int_col In (1745, 1548, 4785)
And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
Where some_int_col = m.id_col)
And Not Exists (Select * From A_70k_rows_table
Where some_int_col = m.id_col)
Select id from @Ids -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified @Ids is correct
-- Once you have verified that above @Ids is correctly populated,
-- then delete or comment out the select and return lines above so insert runs.
Begin Transaction
Delete OT -- eliminate row-by-row call to second stored proc
From OtherTable ot
Join MAIN_TABLE m On m.id_col = ot.FKCol
Join @Ids i On i.Id = m.id_col
Insert SECONDERY_TABLE(col1, col2, etc.)
Select col1,col2,col3.....
FROM MAIN_TABLE m Join @Ids i On i.Id = m.id_col
Delete m -- eliminate row-by-row call to second stored proc
FROM MAIN_TABLE m
Join @Ids i On i.Id = m.id_col
Commit Transaction
Explaanation. Explaanation。
You had numerous filtering conditions that were not SARGable, ie, they would force a complete table scan for every iteration of your loop, instead of being able to use any existing index. 您有许多不是SARGable的过滤条件,即它们会强制对循环的每次迭代进行完整的表扫描,而不是能够使用任何现有索引。 Always try to avoid filter conditions that apply processing logic to a table column value before comparing it to some other value. 始终尝试避免将处理逻辑应用于表列值的过滤条件,然后再将其与其他值进行比较。 This eliminates the opportunity for the query optimizer to use an index. 这消除了查询优化器使用索引的机会。
You were executing the inserts one at a time... Way better to generate a list of PK Ids that need to be processed (all at once) and then do all the inserts at once, in one statement. 您正在一次执行一个插入...更好地生成需要处理的PK ID列表(一次全部),然后在一个语句中一次执行所有插入。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.