简体   繁体   中英

Is there any library like SqlBulkCopy for deleting large number of rows in C# and SQL Server?

Using SqlBulkCopy I can insert a large number of rows in C# into SQL Server which is really very fast. I am in need of a similar thing for delete operations.

Is there anything available that will perform faster, high-performance delete operation in C#? Using a stored procedure with a list of row id as a parameter is not a good way. Calling the stored procedure many times is also not a good idea. So what is the best idea?

I need to delete around 30k rows from a table.

You can make an additional table with a column for your target IDs, and perhaps also a job ID. Then you can insert your IDs into this table. You'll end up with around 30k rows in the new table: one for each ID to delete. If your table's IDs are not the primary key, this is also the time to make that translation. You can let this part of the operation be a little slow if you need to, because it won't interfere with other parts of your database in terms of locks, or you can use a bulk insert technique, which you already seem comfortable with.

Once this table is populated, you can write a DELETE statement that includes a JOIN to this new table . In my experience, this will be the fastest way to complete your delete operation.

This doesn't sound like your situation, but when you really need it to run faster you can also try running the job during a period of lower load, or if you have a maintenance window you can switch to bulk-logged for a while, though this is an option of last resort.

After completing the query, truncate the extra table or delete rows with your job ID.

If this still isn't fast enough, you can improve things further by implementing logical deletes in your application. A logical delete is when you add a column in the table with a name like IsDeleted or DeletedDate . Then, to delete a row you only need to update the value of that column. You can also have a separate process that runs in the background to clean up these records after a certain amount of time if you want.

Logical deletes can be much faster for processing larger batches. However, it can mean re-writing significant parts of your application to understand the difference between a deleted record and an active record.

You can use the MERGE instruction to massively delete data in SQL Server.

You need to specify primary keys of the table you want to delete from, and make it match on your actual table. Then you can decide what to do with matched or not matched records, in you case delete when match:

create table MyTable ( id int primary key, data varchar(5) )
insert into MyTable values (1, 'a'), (2, 'b'), (3, 'c')

merge MyTable as t
using
(
    select * from
    (
        values (1), (2)
    ) V(id)
) as x
on x.id = t.id
when matched then delete;

select * from MyTable -- only pk 3 left

This works very well when deleting large amount of data and it's enough easy to implement in C#, although it has not the same performance of SqlBulkCopy .

MERGE docs: https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql

I would normally re-use a Custom Table Type as part of an extended CRUD set of stored procedures when this situation arises.

CREATE PROCEDURE MyTableDeletByPrimaryKeyInBulk(@PrimaryKeyTable IntegerIDTable)
AS 
    DELETE D
    FROM MyTable D
    INNER JOIN @PrimaryKeyTable P ON P.ID=D.TableID

Of course, the mileage may vary based on the number and types of indexes and how busy the server is. I ran this against a pretty loaded server. I set the maximum threshold to 100K records and it built up the data and performed the delete operation in less than 1 second.

DECLARE @RecordCount INT = 100000
;
WITH R1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(10))dt(n)),
R2(N) AS (SELECT 1 FROM R1 a, R1 b),
R3(N) AS (SELECT 1 FROM R2 a, R2 b), 
R4(N) AS (SELECT 1 FROM R3 a, R3 b),
Tally(Number) AS (SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM R4)

INSERT
INTO _DeleteTest
SELECT Number,'Hey' FROM Tally
WHERE
    Number<=100000

DECLARE @T TABLE(ID INT)
INSERT INTO @T SELECT ID FROM _DeleteTest

DELETE D
FROM _DeleteTest D
INNER JOIN @T P ON P.ID=D.ID


(100000 rows affected)

(100000 rows affected)

(100000 rows affected)

Completion time: 2020-11-13T13:53:54.2883938-05:00

I dont remember to see a BulkDelete, but according[Microsoft][1] https://social.tec.net.microsoft.com/wiki/contents/articles/20651.sql-server-delete-a-huge-amount-of-data-from-a-table.aspx [1]:

You just apply a truncate table TableName, it is like a Bulk and empty all record in the table, if is not like this follow the previus advice from @Phate01

Simply use smaller delete in loop works for me in "big delete" scenario

plus: this not lock all table for a long time.

cons: no benefit if used in a single transaction

--declare @i int =0
SELECT NULL --Makes the WHILE loop work.
WHILE @@ROWCOUNT <> 0
BEGIN
    --set @i=@i+1
    --print(@i)
    DELETE TOP (5000) FROM yourtable 
    WHERE ...
    
END

30K is a small number of rows for sql,i use this to delete milions of rows in huge table.

Coluld be some other problem like bad indexing, lock, heap table ecc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM