简体   繁体   English

创建 SQL 索引以提高速度

[英]Create SQL indexes for speed

I am removing duplicates from a table with a transaction_id column and a last_modified_date column (see below the query).我正在从带有transaction_id列和last_modified_date列的表中删除重复项(请参阅下面的查询)。 The idea is that I should have one record per transaction_id , so I need to remove duplicates, keeping the last modified record for a given transaction_id .这个想法是每个transaction_id应该有一个记录,所以我需要删除重复项,保留给定transaction_id的最后修改记录。

The query works but is slow.查询有效,但速度很慢。

The question is: what index I should create to speed up the query execution time?问题是:我应该创建什么索引来加快查询执行时间?

With CTE_Duplicates as
(
   select 
       transaction_id, 
       row_number() over (partition by transaction_id order by last_modified_date desc) rownumber 
   from 
       TRANSACTIONS 
)  
delete from CTE_Duplicates 
where rownumber != 1;  

Thanks!谢谢!

Vald瓦尔德

For your version of the query:对于您的查询版本:

With CTE_Duplicates as (
    select t.*,
           row_number() over (partition by transaction_id order by last_modified_date desc) as rownumber
    from TRANSACTIONS
   )
delete from CTE_Duplicates
    where rownumber > 1;

You want an index on (transaction_id, last_modified_date desc) .您需要(transaction_id, last_modified_date desc)上的索引。 However, with that same index, it might be faster to phrase the query as:但是,使用相同的索引,将查询表述为以下语句可能会更快:

delete t from transactions t
    where t.last_modified_date = (select max(t2.last_modified_date)
                                  from transactions t2
                                  where t2.transaction_id = t.transaction_id
                                 );

All that said, your query will be quite expensive if many rows are being deleted ("many" might even be a few percent).尽管如此,如果要删除许多行(“许多”甚至可能是百分之几),您的查询将非常昂贵。 In that case, a temporary table solution might be better:在这种情况下,临时表解决方案可能更好:

select t.*
into temp_transactions
from transactions t
where t.last_modified_date = (select max(t2.last_modified_date)
                              from transactions t2
                              where t2.transaction_id = t.transaction_id
                             );

truncation table temp_transactions;  -- backup first!

insert into transactions
    select *
    from temp_transactions;

Of course, the logic will be more complicated if you have identity columns or triggers that set values on the table.当然,如果您有标识列或在表上设置值的触发器,则逻辑会更加复杂。

For this query:对于此查询:

with CTE_Duplicates as (
    select 
        transaction_id, 
        row_number() 
            over(partition by transaction_id order by last_modified_date desc ) rownumber 
    from TRANSACTIONS 
) 
delete from CTE_Duplicates where rownumber!=1;

You just want a composite index on (transaction_id, last_modified_date) .您只需要(transaction_id, last_modified_date)上的复合索引。

create index idx_transactions_dup on transactions(transaction_id, last_modified_date);

No matter what solution you choose, probably the best thing you can do is add a compound index on (transaction_id, last_modified_date).无论您选择哪种解决方案,您能做的最好的事情可能就是在 (transaction_id, last_modified_date) 上添加复合索引。 After doing that, I would go with an aggregate function over a windowing one (given their partitioning and ordering abilities, I am not sure how well they would take advantage of the ideal index)...这样做之后,我会在窗口函数上使用聚合函数(考虑到它们的分区和排序能力,我不确定它们会如何利用理想的索引)......

; WITH keepers AS (
   SELECT transaction_id, MAX(last_modified_date) AS last_modified_date
   FROM transactions 
   GROUP BY transaction_id
)
DELETE t 
FROM transactions AS t
LEFT JOIN keepers AS k
   ON t.transaction_id = k.transaction_id 
   AND t.last_modified_date = k.last_modified_date
WHERE k.transaction_id IS NULL
;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM