简体   繁体   English

SQL Oracle 脚本优化

[英]SQL Oracle script optimization

I have a table TRANSACTIONS with almost 30 million transactions (13 COLUMNS).我有一个表 TRANSACTIONS,其中包含近 3000 万笔交易(13 列)。 How Can I optimize following code?如何优化以下代码? I tried with self join but it seemed to be less effective.我尝试了自我加入,但它似乎不太有效。

Logic: I want to get last transactions by sender-receiver_2 if receiver_2 exists, else by sender-receiver + calculate some statistics (10/30/90 days)逻辑:如果receiver_2存在,我想通过sender-receiver_2获取最后一笔交易,否则通过sender-receiver + 计算一些统计数据(10/30/90 天)

SELECT T.* FROM
(SELECT T.*, row_number() over (partition by T.SENDER, (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END) order by T.DATE_ACCEPT desc) as seqnum 
FROM 
(
SELECT T.*
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10  AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        ) CNT_10
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) CNT_30
      ,(SELECT COUNT(DISTINCT T2.ID_TRAN)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90  AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) CNT_90 
        ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
             AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_10
      ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_30
        ,(SELECT DISTINCT AVG(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END) OVER()
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        GROUP BY T2.ID_TRAN, (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        ) AVG_AMOUNT_90
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER
        ) MAX_AMOUNT_10
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) MAX_AMOUNT_30
        ,(SELECT MAX(CASE WHEN T.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END)
        FROM TRANSACTIONS T2
        WHERE T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND
              T2.DATE_ACCEPT < T.DATE_ACCEPT AND
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T2.RECEIVER ELSE T2.RECEIVER_2 END) =
              (CASE WHEN T.RECEIVER_2 IS NULL THEN T.RECEIVER ELSE T.RECEIVER_2 END)
              AND
              T2.SENDER = T.SENDER 
        ) MAX_AMOUNT_90
FROM TRANSACTIONS T
) T ) T
WHERE T.SEQNUM = 1

Also I created index on (SENDER, DATE_ACCEPT) .我还在(SENDER, DATE_ACCEPT)上创建了索引。

Query plan 查询计划

TABLE EXAMPLE表格示例

Having an index on (SENDER, DATE_ACCEPT) will probably help.在 (SENDER, DATE_ACCEPT) 上有一个索引可能会有所帮助。

And you can simplify & accelarate the query by using one LATERAL JOIN with conditional aggregation.您可以通过使用带有条件聚合的LATERAL JOIN来简化和加速查询。
It allows to calculate more than 1 COUNT/AVG/MAX.它允许计算超过 1 个 COUNT/AVG/MAX。

For example:例如:

 SELECT T.*, LT.* FROM ( SELECT SENDER, RECEIVER, RECEIVER_2, DATE_ACCEPT, AMOUNT, AMOUNT_2 FROM ( SELECT SENDER, RECEIVER, RECEIVER_2, DATE_ACCEPT, AMOUNT, AMOUNT_2, ROW_NUMBER() OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT DESC) AS RN FROM TRANSACTIONS ) TRANS WHERE RN = 1 ) T CROSS JOIN LATERAL ( SELECT COUNT(DISTINCT CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN T2.ID_TRAN END) AS CNT_10, COUNT(DISTINCT CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN T2.ID_TRAN END) AS CNT_30, COUNT(DISTINCT CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN T2.ID_TRAN END) AS CNT_90, NVL(AVG( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS AVG_AMOUNT_10, NVL(AVG( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS AVG_AMOUNT_30, NVL(AVG( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS AVG_AMOUNT_90, NVL(MAX( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 10 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS MAX_AMOUNT_10, NVL(MAX( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 30 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS MAX_AMOUNT_30, NVL(MAX( CASE WHEN T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND T2.DATE_ACCEPT < T.DATE_ACCEPT THEN CASE WHEN T2.RECEIVER_2 IS NULL THEN T2.AMOUNT ELSE T2.AMOUNT_2 END END), 0) AS MAX_AMOUNT_90 FROM TRANSACTIONS T2 WHERE T2.SENDER = T.SENDER AND T2.DATE_ACCEPT > T.DATE_ACCEPT - 90 AND NVL(T2.RECEIVER_2, T2.RECEIVER) = NVL(T.RECEIVER_2, T.RECEIVER) ) LT;
SENDER发件人 RECEIVER接收者 RECEIVER_2 RECEIVER_2 DATE_ACCEPT DATE_接受 AMOUNT数量 AMOUNT_2 AMOUNT_2 CNT_10 CNT_10 CNT_30 CNT_30 CNT_90 CNT_90 AVG_AMOUNT_10 AVG_AMOUNT_10 AVG_AMOUNT_30 AVG_AMOUNT_30 AVG_AMOUNT_90 AVG_AMOUNT_90 MAX_AMOUNT_10 MAX_AMOUNT_10 MAX_AMOUNT_30 MAX_AMOUNT_30 MAX_AMOUNT_90 MAX_AMOUNT_90
1 1 2 2 3 3 30-MAR-21 21 年 3 月 30 日 10 10 20 20 1 1 2 2 3 3 11.2 11.2 21.65 21.65 45.5 45.5 11.2 11.2 32.1 32.1 93.2 93.2

Demo on db<>fiddle here关于db<>fiddle 的演示在这里

Your logic with receiver / receiver2 makes it only confusing , it is not the cause of your performance problems .您与receiver / receiver2的逻辑只会让人感到困惑,这不是您的性能问题原因

You'll get the same problems with a simple models with SENDER, RECEIVER, AMOUNT and DATE_ACCEPT which I'm using in the example - adapt for your purpose.对于我在示例中使用的具有SENDER, RECEIVER, AMOUNTDATE_ACCEPT的简单模型,您将遇到相同的问题 - 适应您的目的。

First you should realize what is the cause of the problem .首先你应该意识到问题的原因是什么。

You are joining a large transaction table with it's history, producing a big result only to aggregate it and calculate the aggregated measures.您正在加入一个带有历史记录的大型事务表,生成一个大结果只是为了聚合它并计算聚合度量。

The key idea is to aggregate first and join back to the transaction table in the second step.关键思想是首先聚合并在第二步中连接回事务表。

The query below calculates first the max_date_accept for each sender / receiver to calculate the aggregate measures in the next step using the history window (example for window 10 day - adapt as required).下面的查询首先计算每个发送者/接收者的max_date_accept以使用历史 window 计算下一步中的聚合度量(例如 window 10 天 - 根据需要调整)。

Note that I copy your logic of ignoring the last transaction in the calculation by adding the predicate DATE_ACCEPT < max_date_accept .请注意,我通过添加谓词DATE_ACCEPT < max_date_accept复制了您忽略计算中最后一个事务的逻辑。

This leads to the result on NULL if there is only one transaction in the calculated time interval, which is probably not what you want .如果在计算的时间间隔内只有一个事务,这会导致NULL上的结果,这可能不是您想要的。

with trans as (
select 
 ID_TRAN, SENDER,  RECEIVER, AMOUNT,  DATE_ACCEPT,
 max(DATE_ACCEPT) over (partition by T.SENDER,  T.RECEIVER) max_date_accept
from TRANSACTIONS t
)
select 
  SENDER, RECEIVER,
  count(distinct case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then ID_TRAN end) CNT_10,
  avg(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) AVG_AMOUNT_10,
  max(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) MAX_AMOUNT_10
from trans  
group by SENDER, RECEIVER; 

Could be that this result is already what you want, but if you realy want the complete set of the columns from the transaction table with the values of the first transaction , simple join the aggregated result to the transaction table:可能这个结果已经是您想要的,但如果您真的想要事务表中包含第一个事务值的完整列集,只需将聚合结果连接到事务表:

with trans as (
select 
 ID_TRAN, SENDER,  RECEIVER, AMOUNT,  DATE_ACCEPT,
 max(DATE_ACCEPT) over (partition by T.SENDER,  T.RECEIVER) max_date_accept
from TRANSACTIONS t
),
agg as (
select 
  SENDER, RECEIVER,
  count(distinct case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then ID_TRAN end) CNT_10,
  avg(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) AVG_AMOUNT_10,
  max(case when DATE_ACCEPT > max_date_accept - 10 and DATE_ACCEPT < max_date_accept then AMOUNT end) MAX_AMOUNT_10
from trans  
group by SENDER, RECEIVER),
trans2 as (
select
 t.*,
 row_number() over (partition by SENDER, RECEIVER order by DATE_ACCEPT desc) as seqnum
from TRANSACTIONS t)
select
 trans2.*,
 agg.CNT_10, agg.AVG_AMOUNT_10, agg.MAX_AMOUNT_10
from trans2
join agg on trans2.SENDER = agg.SENDER and trans2.RECEIVER = agg.RECEIVER
where seqnum = 1; 

Performance note - check the execution plan of the query.性能说明 - 检查查询的执行计划

You should see only TABLE ACCESS FULL and HASH JOIN .您应该只看到TABLE ACCESS FULLHASH JOIN Query of your type get often problems if they use NESTED LOOPS or FILTER join with INDEX ACCESS .如果您使用NESTED LOOPSFILTER join 和INDEX ACCESS ,您的类型查询经常会出现问题。

The major problem in your query is the CASE in predicates.查询中的主要问题是谓词中的CASE It invalidates the usage of any index.它使任何索引的使用无效。 Therefore you need to use a virtual column:因此,您需要使用虚拟列:

ALTER TABLE Transactions ADD rec AS (
     CASE WHEN RECEIVER_2 IS NULL 
     THEN RECEIVER ELSE RECEIVER_2 END
);

The second step is to create an index with this column:第二步是使用该列创建索引:

CREATE INDEX ix_transactions_sender_rec 
    ON Transactions(sender, rec, date_accept)

However, the index may not be used due to the query syntax.但是,由于查询语法的原因,可能不会使用索引。 Replace the CASE syntax with the newly created column rec and also rewrite the greatest per group solution into a self-join.用新创建的列rec替换CASE语法,并将greatest per group解重写为自联接。 I add the reduced SQL example of how to do it.我添加了简化的 SQL 示例来说明如何做到这一点。

select t.*,
    (
           select count(DISTINCT T2.id_tran)
           from transactions T2
           where T2.date_accept > T.date_accept - 10
                 AND T2.date_accept < T.date_accept
                 AND T2.rec = T.rec
                 AND T2.sender = T.sender
    ) CNT_10
from (
    select sender, rec, max(date_accept)
    from transactions
    group sender, rec
) tmax 
join transactions t on t.sender = tmax.sender and
                       t.rec = tmax.rec and
                       t.date_accept = tmax.date_accept

And if you want your statistical subqueries super fast, than add also other columns used in them:如果您希望您的统计子查询超级快,那么还可以添加其中使用的其他列:

CREATE INDEX ix_transactions_sender_rec 
    ON Transactions(sender, rec, date_accept, id_tran, amount)

Are you aware of the Analytic Functions Windowing Clause ?您知道分析函数窗口条款吗?

I don't get the logic of your query, but I guess it might be possible without any self-joins.我不明白您查询的逻辑,但我想可能没有任何自联接。 Have a look this query, it could be a starting point:看看这个查询,它可能是一个起点:

SELECT 
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) AS CNT_10,
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS CNT_30,
    COUNT(ID_TRAN) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS CNT_90,
    AVG(NVL(T.AMOUNT_2, T.AMOUNT) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) AS AVG_30,
    AVG(NVL(T.AMOUNT_2, T.AMOUNT) OVER (PARTITION BY SENDER, NVL(RECEIVER_2, RECEIVER) ORDER BY DATE_ACCEPT RANGE BETWEEN INTERVAL '90' DAY PRECEDING AND CURRENT ROW) AS AVG_90
FROM TRANSACTIONS

Note, RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW) is equal to RANGE INTERVAL '10' DAY PRECEDING)注意,区间“前RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND CURRENT ROW)行之间的RANGE INTERVAL '10' DAY PRECEDING)

Another note, when I run your query on the sample data, then I get另一个注意事项,当我对示例数据运行查询时,我得到

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|ID_TRAN|SENDER|RECEIVER|RECEIVER_2|AMOUNT|AMOUNT_2|DATE_ACCEPT        |CNT_10|CNT_30|CNT_90|AVG_AMOUNT_10|AVG_AMOUNT_30|AVG_AMOUNT_90|MAX_AMOUNT_10|MAX_AMOUNT_30|MAX_AMOUNT_90|SEQNUM|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1      |00010 |22222   |1112      |3000  |1000    |16.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|1      |00010 |22222   |2114      |3000  |2000    |16.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|2      |01236 |45872   |          |4000  |        |01.04.2021 22:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
|3      |45872 |00010   |          |5000  |        |17.04.2021 14:01:00|0     |0     |0     |             |             |             |             |             |             |1     |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

which looks quite pointless.这看起来毫无意义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM