简体   繁体   中英

Remove records by clustered or non-clustered index

I have a table (let's say ErrorLog )

在此处输入图片说明

CREATE TABLE [dbo].[ErrorLog]
(
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Created] [datetime] NOT NULL,
    [Message] [varchar](max) NOT NULL,

    CONSTRAINT [PK_ErrorLog] 
        PRIMARY KEY CLUSTERED ([Id] ASC)
)

I want to remove all records that are older that 3 months.

I have a non-clustered index on the Created column (ascending).

I am not sure which one of these is better (seem to take same time).

Query #1 :

DELETE FROM ErrorLog
WHERE Created <= DATEADD(month, - 3, GETDATE())

Query #2 :

DECLARE @id INT

SELECT @id = max(l.Id)
FROM ErrorLog l
WHERE l.Created <= DATEADD(month, - 3, GETDATE())

DELETE FROM ErrorLog
WHERE Id <= @id

Once you know the maximum clustered key you want to delete then it is definitely faster to use this key. The question is whether it worth selecting this key first using the date. The right decision depends on size of the table and what portion of data you need to delete. The smaller the table is and the smaller is also the number of records for deletion the more efficient should be the first option (Query #1). However, if the number of records to delete is large enough, then the non-clustered index on Date column will be ignored and SQL Server will start scanning the base table. In such a case the second option (Query #2) might be more optimal. And there are usually also other factors to consider.

I have solved similar issue recently (deleting about 600 million (2/3) old records from 1.5TB table) and I have decided for the second approach in the end. There were several reasons for it, but the main were as follows.

The table had to be available for new inserts while the old records were being deleted. So, I could not delete the records in one monstrous delete statement but rather I had to use several smaller batched in order to avoid lock escalation to the table level. Smaller batches kept also the transaction log size in reasonable limits. Furthermore, I had only about one hour long maintenance window each day and it was not possible to delete all required records within one day.

With above mentioned in mind the fastest solution for me was to select the maximum ID I needed to delete according to the Date column and then just start deleting from the beginning of the clustered index as far as to the selected Id one batch after the other ( DELETE TOP(@BatchSize) FROM ErrorLog WITH(PAGLOCK) WHERE ID <= @myMaxId ). I used the PAGLOCK hint in order to increase the batch size without escalating the lock to the table level. I deleted several batches each day in the end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM