创建 SQL 索引以提高速度

Question

I am removing duplicates from a table with a transaction_id column and a last_modified_date column (see below the query).我正在从带有transaction_id列和last_modified_date列的表中删除重复项（请参阅下面的查询）。 The idea is that I should have one record per transaction_id , so I need to remove duplicates, keeping the last modified record for a given transaction_id .这个想法是每个transaction_id应该有一个记录，所以我需要删除重复项，保留给定transaction_id的最后修改记录。

The query works but is slow.查询有效，但速度很慢。

The question is: what index I should create to speed up the query execution time?问题是：我应该创建什么索引来加快查询执行时间？

With CTE_Duplicates as
(
   select 
       transaction_id, 
       row_number() over (partition by transaction_id order by last_modified_date desc) rownumber 
   from 
       TRANSACTIONS 
)  
delete from CTE_Duplicates 
where rownumber != 1;

Thanks!谢谢！

Vald瓦尔德

Answer 1

For your version of the query:对于您的查询版本：

With CTE_Duplicates as (
    select t.*,
           row_number() over (partition by transaction_id order by last_modified_date desc) as rownumber
    from TRANSACTIONS
   )
delete from CTE_Duplicates
    where rownumber > 1;

You want an index on (transaction_id, last_modified_date desc) .您需要(transaction_id, last_modified_date desc)上的索引。 However, with that same index, it might be faster to phrase the query as:但是，使用相同的索引，将查询表述为以下语句可能会更快：

delete t from transactions t
    where t.last_modified_date = (select max(t2.last_modified_date)
                                  from transactions t2
                                  where t2.transaction_id = t.transaction_id
                                 );

All that said, your query will be quite expensive if many rows are being deleted ("many" might even be a few percent).尽管如此，如果要删除许多行（“许多”甚至可能是百分之几），您的查询将非常昂贵。 In that case, a temporary table solution might be better:在这种情况下，临时表解决方案可能更好：

select t.*
into temp_transactions
from transactions t
where t.last_modified_date = (select max(t2.last_modified_date)
                              from transactions t2
                              where t2.transaction_id = t.transaction_id
                             );

truncation table temp_transactions;  -- backup first!

insert into transactions
    select *
    from temp_transactions;

Of course, the logic will be more complicated if you have identity columns or triggers that set values on the table.当然，如果您有标识列或在表上设置值的触发器，则逻辑会更加复杂。

Answer 2

For this query:对于此查询：

with CTE_Duplicates as (
    select 
        transaction_id, 
        row_number() 
            over(partition by transaction_id order by last_modified_date desc ) rownumber 
    from TRANSACTIONS 
) 
delete from CTE_Duplicates where rownumber!=1;

You just want a composite index on (transaction_id, last_modified_date) .您只需要(transaction_id, last_modified_date)上的复合索引。

create index idx_transactions_dup on transactions(transaction_id, last_modified_date);

Answer 3

No matter what solution you choose, probably the best thing you can do is add a compound index on (transaction_id, last_modified_date).无论您选择哪种解决方案，您能做的最好的事情可能就是在 (transaction_id, last_modified_date) 上添加复合索引。 After doing that, I would go with an aggregate function over a windowing one (given their partitioning and ordering abilities, I am not sure how well they would take advantage of the ideal index)...这样做之后，我会在窗口函数上使用聚合函数（考虑到它们的分区和排序能力，我不确定它们会如何利用理想的索引）......

; WITH keepers AS (
   SELECT transaction_id, MAX(last_modified_date) AS last_modified_date
   FROM transactions 
   GROUP BY transaction_id
)
DELETE t 
FROM transactions AS t
LEFT JOIN keepers AS k
   ON t.transaction_id = k.transaction_id 
   AND t.last_modified_date = k.last_modified_date
WHERE k.transaction_id IS NULL
;

创建 SQL 索引以提高速度

问题描述

3 个解决方案

解决方案1
1 2020-01-07 00:00:21

解决方案2
0 2020-01-06 23:56:55

解决方案3
0 2020-01-07 00:12:46

创建 SQL 索引以提高速度

问题描述

3 个解决方案

解决方案1 1 2020-01-07 00:00:21

解决方案2 0 2020-01-06 23:56:55

解决方案3 0 2020-01-07 00:12:46

解决方案1
1 2020-01-07 00:00:21

解决方案2
0 2020-01-06 23:56:55

解决方案3
0 2020-01-07 00:12:46