[英]Create SQL indexes for speed
I am removing duplicates from a table with a transaction_id
column and a last_modified_date
column (see below the query).我正在从带有
transaction_id
列和last_modified_date
列的表中删除重复项(请参阅下面的查询)。 The idea is that I should have one record per transaction_id
, so I need to remove duplicates, keeping the last modified record for a given transaction_id
.这个想法是每个
transaction_id
应该有一个记录,所以我需要删除重复项,保留给定transaction_id
的最后修改记录。
The query works but is slow.查询有效,但速度很慢。
The question is: what index I should create to speed up the query execution time?问题是:我应该创建什么索引来加快查询执行时间?
With CTE_Duplicates as
(
select
transaction_id,
row_number() over (partition by transaction_id order by last_modified_date desc) rownumber
from
TRANSACTIONS
)
delete from CTE_Duplicates
where rownumber != 1;
Thanks!谢谢!
Vald瓦尔德
For your version of the query:对于您的查询版本:
With CTE_Duplicates as (
select t.*,
row_number() over (partition by transaction_id order by last_modified_date desc) as rownumber
from TRANSACTIONS
)
delete from CTE_Duplicates
where rownumber > 1;
You want an index on (transaction_id, last_modified_date desc)
.您需要
(transaction_id, last_modified_date desc)
上的索引。 However, with that same index, it might be faster to phrase the query as:但是,使用相同的索引,将查询表述为以下语句可能会更快:
delete t from transactions t
where t.last_modified_date = (select max(t2.last_modified_date)
from transactions t2
where t2.transaction_id = t.transaction_id
);
All that said, your query will be quite expensive if many rows are being deleted ("many" might even be a few percent).尽管如此,如果要删除许多行(“许多”甚至可能是百分之几),您的查询将非常昂贵。 In that case, a temporary table solution might be better:
在这种情况下,临时表解决方案可能更好:
select t.*
into temp_transactions
from transactions t
where t.last_modified_date = (select max(t2.last_modified_date)
from transactions t2
where t2.transaction_id = t.transaction_id
);
truncation table temp_transactions; -- backup first!
insert into transactions
select *
from temp_transactions;
Of course, the logic will be more complicated if you have identity columns or triggers that set values on the table.当然,如果您有标识列或在表上设置值的触发器,则逻辑会更加复杂。
For this query:对于此查询:
with CTE_Duplicates as (
select
transaction_id,
row_number()
over(partition by transaction_id order by last_modified_date desc ) rownumber
from TRANSACTIONS
)
delete from CTE_Duplicates where rownumber!=1;
You just want a composite index on (transaction_id, last_modified_date)
.您只需要
(transaction_id, last_modified_date)
上的复合索引。
create index idx_transactions_dup on transactions(transaction_id, last_modified_date);
No matter what solution you choose, probably the best thing you can do is add a compound index on (transaction_id, last_modified_date).无论您选择哪种解决方案,您能做的最好的事情可能就是在 (transaction_id, last_modified_date) 上添加复合索引。 After doing that, I would go with an aggregate function over a windowing one (given their partitioning and ordering abilities, I am not sure how well they would take advantage of the ideal index)...
这样做之后,我会在窗口函数上使用聚合函数(考虑到它们的分区和排序能力,我不确定它们会如何利用理想的索引)......
; WITH keepers AS (
SELECT transaction_id, MAX(last_modified_date) AS last_modified_date
FROM transactions
GROUP BY transaction_id
)
DELETE t
FROM transactions AS t
LEFT JOIN keepers AS k
ON t.transaction_id = k.transaction_id
AND t.last_modified_date = k.last_modified_date
WHERE k.transaction_id IS NULL
;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.