[英]SQL Server update query extremely slow
I want to a simple update statement on a table with c.200m records.我想在包含 c.200m 条记录的表上执行一个简单的更新语句。 However, it seems to be taking ages.
然而,这似乎需要很长时间。
UPDATE a
SET hybrid_trade_flag = CASE WHEN b.trade_id IS NOT NULL THEN 'Y' ELSE 'N' END
FROM [tbl_master_trades] a
LEFT JOIN [tbl_hybrid_trades_subset] b
ON a.trade_id = b.trade_id
The 1st table tbl_master_trades
has c.200m records and has an index created on trade_id
and another column (together).第一个表
tbl_master_trades
有 c.200m 条记录,并在trade_id
和另一列(一起)上创建了一个索引。 The 2nd table tbl_hybrid_trades_subset
has around 200k.第二张表
tbl_hybrid_trades_subset
有大约 200k。 This query ran for over 40 mins before I had to cancel it (cancellation itself took around 30 min).在我不得不取消它之前,这个查询运行了 40 多分钟(取消本身花了大约 30 分钟)。
I thought maybe converting the 2nd table into a temp table and splitting the statement would help, so converted it into the following:我认为也许将第二个表转换为临时表并拆分语句会有所帮助,因此将其转换为以下内容:
UPDATE a
SET hybrid_trade_flag = 'Y'
FROM [tbl_master_trades] a
INNER JOIN #tmp_hybrid_trades b
ON a.trade_id = b.trade_id
UPDATE a
SET hybrid_trade_flag = 'N'
FROM [tbl_master_trades] a
WHERE hybrid_trade_flag IS NULL
Even above two queries took 30 min to run.甚至以上两个查询也需要 30 分钟才能运行。 I need to run several such updates (c.80) on the 1st table, so I'm not sure if this is viable as it would take days?
我需要在第一张桌子上运行几个这样的更新(c.80),所以我不确定这是否可行,因为这需要几天时间? Can someone please advise on if/how I can speed this up?
有人可以建议我是否/如何加快速度吗?
I would start by rewriting the query to use exists
:我将首先重写查询以使用
exists
:
update t
set hybrid_trade_flag = case
when exists(select 1 from tbl_hybrid_trades_subset ts where ts.trade_id = t.trade_id)
then 'Y'
else 'N'
end
from tbl_master_trades t
Then, I would recommend an index on tbl_hybrid_trades_subset(trade_id)
so the subquery can execute quickly.然后,我会推荐
tbl_hybrid_trades_subset(trade_id)
上的索引,以便子查询可以快速执行。
An index on tbl_master_trades(trade_id)
might also help (without any other column in the index), but the index on the table that the subquery addresses seems more important. tbl_master_trades(trade_id)
上的索引也可能有帮助(索引中没有任何其他列),但子查询所寻址的表上的索引似乎更重要。
That said, 200M rows is still a large number of rows to proceed, so the query will probably take quite a lot of time anyway.也就是说,200M 行仍然是需要处理的大量行,因此查询可能会花费相当多的时间。
You're running into 2 problems你遇到了两个问题
To work around this you can要解决这个问题,您可以
The index in question would be有问题的索引是
CREATE INDEX idx_trade_id ON tbl_hybrid_trades_subset (trade_id)
To limit the amount of updates use this:要限制更新数量,请使用:
UPDATE a
SET hybrid_trade_flag = CASE WHEN b.trade_id IS NOT NULL THEN 'Y' ELSE 'N' END
FROM [tbl_master_trades] a
LEFT OUTER JOIN [tbl_hybrid_trades_subset] b
ON b.trade_id = a.trade_id
WHERE hybrid_trade_flag != CASE WHEN b.trade_id IS NOT NULL THEN 'Y' ELSE 'N' END
The first time might still take a while but subsequent updates should be quite a bit faster.第一次可能还需要一段时间,但后续更新应该会快很多。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.