简体   繁体   中英

TSQL Eager Spool on simple update of large table

While attempting a very simple UPDATE operation on a large table (1.3 billion records) the DB takes well over an hour. Execution Plan show Eager spooling, so I assume it is copying the table in large part or in whole to the temp DB prior to affecting the change.

For my purposes, I am looping through a series of candidate updates and I just need a quick change of 0 - 10 records on each pass of this large table, and need to move on in a sub-second fashion. Any ideas on how to make this happen? I have tried hints and changing index structures, but am open to most any idea.

Table Layout:

CREATE TABLE [dbo].[its_financial_suppl_jnl]
(
   [financial_suppl_jnl_key] [bigint] NOT NULL IDENTITY(1000000, 1),
   -- ... { Omitting  several column definitions }
   [location_key] [int] NULL,
   -- ... { Omitting  several column definitions }
) ON [TA2]
GO
ALTER TABLE [dbo].[its_financial_suppl_jnl] ADD CONSTRAINT [PK_its_financial_suppl_jnl] PRIMARY KEY CLUSTERED  ([financial_suppl_jnl_key])
GO
-- ... { Omitting  3 Non-clustered index definitions }
CREATE NONCLUSTERED INDEX [tmp1] ON [dbo].[its_financial_suppl_jnl] ([location_key], [financial_suppl_jnl_key]) ON [TA2]
-- ... { Omitting  12 FK definitions }

Sample Update Statement:

UPDATE its_financial_suppl_jnl
SET location_key = 964672 
WHERE location_key = 507289

(It's interesting to note that the above query would update 0 records as Location_Key 507289 does not exist in the table.)

You do not have enough of your code presented for me to know the details of how you are doing the looping or what kind of system you have, but let me offer some thoughts on the principal.

When you are looping through a large table doing updates, it is critical that each update statement can find the applicable records quickly. It appears you are doing that in this case ... the index tmp1 is appropriate to the filter of location_key in your update statement. I have seen numerous solutions on this site that suggest an update statement to do something like SET columnA = columnB WHERE columnA != columnB which would seem ok for the first loops but would result in more and more terrible performance as the looping progressed.

The point of looping through the records to do a manageable number of records at a time is defeated if all the loops are part of the same transaction. The easiest way to do this is to issue a CHECKPOINT with each loop. Or you can do a TRY/CATCH and manually handle transactions if you want more control or info. If there is a need to do it all or nothing (one big transaction), you can certainly do that but the looping won't do you any good ... you will still have the whole table locked until the transaction is committed or rolled back and you will also have some minimal performance loss due to the looping.

If you are taking these things into account, the whole batch will still take a while, but the individual updates should be quick and leave your table operationally accessible. If you do all these things and you are still having issues please be a bit more specific about what you are seeing and include the execution plan.

As far as the amount of time it should take to update 1.3 billion records ... that is really going to depend on your system and the I/O performance in particular. And do not forget, you are not just updating the clustered data, you are also updating any indexes that are affected. Without knowing more specifics about your data, the system and the actual amount of time it is taking I cannot say if what you are seeing is out of line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM