what is the best way to delete millions of records in TSQL?

Question

I have a following table structre

Table1       Table2        Table3
--------------------------------
 sId          sId           sId
 name          x              y
  x1          x2             x3

I want to remove all records from table1 that do not have a matching record in the table3 based on sId and if sId present in table2 then do not delete record from table1.Ther are about 20,15 and 10 millions records in table1,table2 & table3 resp. --I have done something like this

Delete Top (3000000)
        From Table1 A
        Left Join Table2 B
        on A.Name ='XYZ' and
           B.sId = A.sId
        Left Join Table3 C
        on A.Name = 'XYZ' and
           C.sId = A.sId

((I have added index on sId But not on Name.)) But This takes a long time to remove records. Is there any better way to delete millions records? Thanks in advance.

Answer 1

do it in batches of 5000 or 10000 instead if you need to delete less than 40% of the data, if you need more then dump what you want to keep in another table/bcp out, truncate this table and insert those rows you dumped in the other table again/bcp in

while @@rowcount > 0
begin
Delete Top (5000)
        From Table1 A
        Left Join Table2 B
        on A.Name ='XYZ' and
           B.sId = A.sId
        Left Join Table3 C
        on A.Name = 'XYZ' and
           C.sId = A.sId
end

Small example you can run to see what happens

CREATE TABLE #test(id INT)

INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)

WHILE @@rowcount > 0
BEGIN 
DELETE TOP (2) FROM #test

END

Answer 2

One way to remove millions of records is to select the remaining records in new tables then drop the old tables and rename the new ones. You can choose the best way for you depending on the foreign keys you can eithe drop and recreate the foreign keys or truncate the data in the old tables and copy the selected data back.

If you need to delete just few records disregard this answer. This is if you actually want to DELETE millions of records.

Answer 3

Using the top clause is more for improving concurrency and may actually make the code run slower.

One suggestion is to delete the data from a derived table: http://sqlblogcasts.com/blogs/simons/archive/2009/05/22/DELETE-TOP-x-rows-avoiding-a-table-scan.aspx

Answer 4

Have you set up appropriate indexes on the relevant table fields? If not it could take a long time to delete the records.

Answer 5

The DELETE operation you're performing is running an underlying SELECT statement to find the records that will be deleted. The operation you're doing is fundamentally a simple join. If you optimize that join, the final DELETE will be faster, too.

Make sure you have the indexes on the columns on which you're doing the joins on. Run an Execution Plan to make sure they are being used.

Answer 6

One other method is to insert the data that you want to keep into another table say Table1_good. Once the is completed and verified: Drop Table1 then Rename Table1_good to Table1

Dirty way to do it but it works.

Answer 7

Once you have cleaned up the data, I would put an AFTER DELETE trigger on table3 that automatically deleted the applicable records from table1. This way you keep the data cleaned up in real time and never have to delete huge chunks.

Answer 8

i'd create a temp table create a seleet and populate the temp table, add indexes to the temp table and delete from my table that i want to delete records from. Then i would drop my temp table when i'm done something like this

Select * into #temp from mytable

Where blah blah(or your query)

//add contraints if you want

i would just shove the primary key into the temp table

then i would say

Delete mytable where primary key in(select myPrimarykey from #temp)

what is the best way to delete millions of records in TSQL?

Question

8 answers

solution1
10 ACCPTED 2010-12-29 20:29:30

solution2
2 2010-12-29 20:31:47

solution3
1 2010-12-29 20:56:06

solution4
0 2010-12-29 20:29:48

solution5
0 2010-12-29 20:30:47

solution6
0 2010-12-29 20:34:20

solution7
0 2010-12-29 22:40:22

solution8
0 2010-12-30 06:39:31

what is the best way to delete millions of records in TSQL?

Question

8 answers

solution1 10 ACCPTED 2010-12-29 20:29:30

solution2 2 2010-12-29 20:31:47

solution3 1 2010-12-29 20:56:06

solution4 0 2010-12-29 20:29:48

solution5 0 2010-12-29 20:30:47

solution6 0 2010-12-29 20:34:20

solution7 0 2010-12-29 22:40:22

solution8 0 2010-12-30 06:39:31

solution1
10 ACCPTED 2010-12-29 20:29:30

solution2
2 2010-12-29 20:31:47

solution3
1 2010-12-29 20:56:06

solution4
0 2010-12-29 20:29:48

solution5
0 2010-12-29 20:30:47

solution6
0 2010-12-29 20:34:20

solution7
0 2010-12-29 22:40:22

solution8
0 2010-12-30 06:39:31