简体   繁体   English

Oracle:优化两次自联接查询

[英]Oracle: Optimizing twice self-join query

I've trying to get this query work efficiently over the past two days. 我正在努力使过去两天的查询效率更高。 I've learn more about Oracle Index behavior that I think I'm confused at this point what is supposed to work and what doesn't. 我已经了解了有关Oracle索引行为的更多信息,我认为在这一点上我感到困惑,哪些应该起作用,哪些不起作用。

Basically, the query is summing up values and comparing to values from yesterday and last week. 基本上,查询是对值求和并与昨天和上周的值进行比较。

I've played around with breaking it down, I've toyed in my mind analytic queries and changing order of indices but nothing seems to work. 我一直在努力分解它,在脑海里玩弄分析查询和更改索引顺序,但是似乎没有任何效果。 All my test has been on a table with 500K rows, as soon as I run it on a table with 20 Million rows it just takes forever. 我所有的测试都是在具有50万行的表上进行的,一旦我在具有2000万行的表上进行了测试,那将永远花光。

Any help is greatly appreciated. 任何帮助是极大的赞赏。

I modified the original post to help you help me. 我修改了原始帖子以帮助您。 :) :)

CREATE TABLE TABLE_1
(ORDER_LINE_ID NUMBER, OFFSET NUMBER, BREAK_ID NUMBER, ZONE NUMBER, NETWORK NUMBER, HOUR_OF_DAY NUMBER, START_TIME DATE, END_TIME DATE, SUCCESS NUMBER
  CONSTRAINT "TABLE_1_PK" PRIMARY KEY (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, HOUR_OF_DAY))

-- SUCCESS is already aggregated during the insert
-- These are last week's records
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (1,0,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (1,30,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 2);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (2,0,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (2,30,1, 1, 1, 2016042001,'04/20/2016 00:00:00', '04/20/2016 02:00:00', 1);

-- These are yesterday's records
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (3,0,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (3,30,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 2);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (4,0,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (4,30,1, 1, 1, 2016042601,'04/26/2016 00:00:00', '04/26/2016 02:00:00', 1);

-- This is today's records
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (5,0,1, 1, 1, 2016042701,'04/27/2016 00:00:00', '04/27/2016 02:00:00', 1);
INSERT INTO TABLE_1 (ORDER_LINE_ID, OFFSET, BREAK_ID, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME, SUCCESS)
VALUES (5,30,1, 1, 1, 2016042701,'04/27/2016 00:00:00', '04/27/2016 02:00:00', 1);

-- Original twice join query
SELECT BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME, SUM(SUCCESS), SUM(YESTERDAY_SUCCESS), SUM(LAST_WEEK_SUCCESS)
FROM TABLE_1 CURRENT_DAY
LEFT OUTER JOIN (
  SELECT SUM(SUCCESS) YESTERDAY_SUCCESS, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME FROM TABLE_1 
  GROUP BY ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME
) YESTERDAY 
  ON YESTERDAY.START_TIME + 1 = CURRENT_DAY.START_TIME
  AND YESTERDAY.END_TIME + 1 = CURRENT_DAY.END_TIME
  AND YESTERDAY.HOUR_OF_DAY = CURRENT_DAY.HOUR_OF_DAY
  AND YESTERDAY.NETWORK = CURRENT_DAY.NETWORK
  AND YESTERDAY.ZONE = CURRENT_DAY.ZONE
LEFT OUTER JOIN (
  SELECT SUM(SUCCESS) LAST_WEEK_SUCCESS, ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME FROM TABLE_1
  GROUP BY ZONE, NETWORK, HOUR_OF_DAY, START_TIME, END_TIME
  ) LAST_WEEK 
  ON YESTERDAY.START_TIME + 7 = CURRENT_DAY.START_TIME
  AND YESTERDAY.END_TIME + 7 = CURRENT_DAY.END_TIME
  AND YESTERDAY.HOUR_OF_DAY = CURRENT_DAY.HOUR_OF_DAY
  AND YESTERDAY.NETWORK = CURRENT_DAY.NETWORK
  AND YESTERDAY.ZONE = CURRENT_DAY.ZONE
GROUP BY BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME;

-- Using Analytic Query (thank you to MT0)
SELECT BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME, SUM(SUCCESS), SUM(YESTERDAY_SUCCESS), SUM(LAST_WEEK_SUCCESS)
FROM (
  SUM( SUCCESS )
    OVER ( PARTITION BY ZONE, NETWORK, HOUR_OF_DAY, TO_CHAR(START_TIME, 'HH24:MI:SS'), TO_CHAR(END_TIME, 'HH24:MI:SS')
           ORDER BY START_TIME
           RANGE BETWEEN INTERVAL '1' DAY PRECDEDING AND INTERVAL '1' DAY PRECEDING
          ) AS YESTERDAY_SUCCESS,
SUM ( SUCCESS )
    OVER ( PARTITION BY ZONE, NETWORK, HOUR_OF_DAY, TO_CHAR(START_TIME, 'HH24:MI:SS'), TO_CHAR(END_TIME, 'HH24:MI:SS')
           ORDER BY START_TIME
           RANGE BETWEEN INTERVAL '7' DAY PRECDEDING AND INTERVAL '7' DAY PRECEDING
         ) AS LAST_WEEK_SUCCESS
FROM TABLE_1 
) T1
WHERE SYSDATE - INTERVAL '12' HOUR <= START_TIME
AND START_TIME < SYSDATE - INTERVAL '1' HOUR
GROUP BY BREAK_ID, ORDER_LINE_ID, HOUR_OF_DAY, OFFSET, ZONE, NETWORK, START_TIME, END_TIME;

I must say thank you for the help to bring this question up to something I hope is more understandable. 我必须说谢谢您为将这个问题提出我希望可以理解的问题所提供的帮助。 Everything works as expected but performance could use some tuning. 一切正常,但性能可能需要一些调整。

1.8 seconds on a table with 500K rows 在具有500K行的表上需要1.8秒

400 seconds on a table with 20 million rows 拥有2000万行的表格400秒

i also want to add some Execution Plans provided by Oracle. 我还想添加Oracle提供的一些执行计划。 I'm having trouble tuning performance. 我在调整性能时遇到问题。

-- using twice self join
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                           |      1 |        |     50 |00:00:00.84 |   99875 |    217 |   1705 |       |       |          |
|   1 |  HASH GROUP BY                  |                           |      1 |   6711 |     50 |00:00:00.84 |   99875 |    217 |   1705 |  1616K|   995K|          |
|*  2 |   FILTER                        |                           |      1 |        |    119K|00:00:00.65 |   99875 |      0 |      0 |       |       |          |
|   3 |    NESTED LOOPS OUTER           |                           |      1 |     54M|    119K|00:00:00.64 |   99875 |      0 |      0 |       |       |          |
|*  4 |     HASH JOIN OUTER             |                           |      1 |    109 |    119K|00:00:00.52 |   99875 |      0 |      0 |    13M|  2093K|     1/0/0|
|   5 |      TABLE ACCESS BY INDEX ROWID| TABLE_1_IDX               |      1 |    109 |    119K|00:00:00.14 |   85908 |      0 |      0 |       |       |          |
|*  6 |       INDEX RANGE SCAN          | START_TIME_IDX            |      1 |    109 |    119K|00:00:00.02 |     320 |      0 |      0 |       |       |          |
|   7 |      VIEW                       |                           |      1 |   1250 |  29311 |00:00:00.23 |   13967 |      0 |      0 |       |       |          |
|   8 |       HASH GROUP BY             |                           |      1 |   1250 |  29311 |00:00:00.22 |   13967 |      0 |      0 |  3008K|  1094K|     1/0/0|
|*  9 |        FILTER                   |                           |      1 |        |  88627 |00:00:00.20 |   13967 |      0 |      0 |       |       |          |
|* 10 |         TABLE ACCESS FULL       | TABLE_1                   |      1 |   1250 |  88627 |00:00:00.19 |   13967 |      0 |      0 |       |       |          |
|  11 |     VIEW                        |                           |    119K|    499K|      0 |00:00:00.10 |       0 |      0 |      0 |       |       |          |
|  12 |      SORT GROUP BY              |                           |    119K|    499K|      0 |00:00:00.08 |       0 |      0 |      0 |  1024 |  1024 |     1/0/0|
|* 13 |       FILTER                    |                           |    119K|        |      0 |00:00:00.02 |       0 |      0 |      0 |       |       |          |
|  14 |        TABLE ACCESS FULL        | TABLE_1                   |      0 |    499K|      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(SYSDATE@!-17<SYSDATE@!-16)
   4 - access("YESTERDAY"."ZONE"="T1"."ZONE" AND "YESTERDAY"."NETWORK"="T1"."NETWORK" AND "YESTERDAY"."HOUR_OF_DAY"="T1"."HOUR_OF_DAY" 
              AND "T1"."END_TIME"=INTERNAL_FUNCTION("YESTERDAY"."END_TIME")+1 AND 
              "T1"."START_TIME"=INTERNAL_FUNCTION("YESTERDAY"."START_TIME")+1)
   6 - access("T1"."START_TIME">=SYSDATE@!-17 AND "T1"."START_TIME"<SYSDATE@!-16)
   9 - filter(SYSDATE@!-17<SYSDATE@!-16)
  10 - filter((INTERNAL_FUNCTION("START_TIME")+1>=SYSDATE@!-17 AND INTERNAL_FUNCTION("START_TIME")+1<SYSDATE@!-16))
  13 - filter(("YESTERDAY"."ZONE"="T1"."ZONE" AND "YESTERDAY"."NETWORK"="T1"."NETWORK" AND "YESTERDAY"."HOUR_OF_DAY"="T1"."HOUR_OF_DAY" 
              AND "T1"."END_TIME"=INTERNAL_FUNCTION("YESTERDAY"."END_TIME")+7 AND 
              "T1"."START_TIME"=INTERNAL_FUNCTION("YESTERDAY"."START_TIME")+7))

Another execution plan using Analytic Query (thank again to MT0) 使用分析查询的另一个执行计划(再次感谢MT0)

-- using analytic query
-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem |  O/1/M   |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |      1 |        |     50 |00:00:01.51 |   13967 |       |       |          |
|   1 |  HASH GROUP BY        |                  |      1 |    499K|     50 |00:00:01.51 |   13967 |    98M|  7788K|          |
|*  2 |   VIEW                |                  |      1 |    499K|    119K|00:00:01.15 |   13967 |       |       |          |
|   3 |    WINDOW SORT        |                  |      1 |    499K|    499K|00:00:01.43 |   13967 |    66M|  2823K|     1/0/0|
|*  4 |     FILTER            |                  |      1 |        |    499K|00:00:00.16 |   13967 |       |       |          |
|   5 |      TABLE ACCESS FULL| TABLE_1          |      1 |    499K|    499K|00:00:00.12 |   13967 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("T1"."START_TIME">=SYSDATE@!-INTERVAL'+17 00:00:00' DAY(2) TO SECOND(0) AND 
              "T1"."START_TIME"<SYSDATE@!-INTERVAL'+16 00:00:00' DAY(2) TO SECOND(0)))
   4 - filter(SYSDATE@!-INTERVAL'+17 00:00:00' DAY(2) TO SECOND(0)<SYSDATE@!-INTERVAL'+16 00:00:00' DAY(2) TO 
              SECOND(0))

As you can see, I added an index on start_time which self-Join query benefits from but estimates vs actuals are off. 如您所见,我在start_time上添加了一个索引,自联接查询可从中受益,但估计与实际情况不符。 Where Analytic Query just decides it wants nothing to do with the index. Analytic Query仅决定它与索引无关。 Any ideas, points of references or help is greatly appreciated. 任何想法,参考点或帮助将不胜感激。 Thanks in advance, everyone. 预先感谢大家。

It is unclear why you are joining only when there are rows with exactly the same time today and yesterday (or last week) but if you just want the rows that are between certain times then you can eliminate all the self-joins and do: 目前尚不清楚为什么仅在今天和昨天(或上周)的行完全相同的情况下才加入,但是如果您只想在特定时间之间存在行,则可以消除所有自联接并执行以下操作:

SELECT order_line,
       zone,
       network,
       sum(
        CASE WHEN SYSDATE - INTERVAL '12' HOUR <= start_time
             AND  start_time < SYSDATE - INTERVAL '1' HOUR
             THEN success
             END
       ) AS total_successes_today,
       sum(
        CASE WHEN SYSDATE - INTERVAL '12' HOUR <= start_time
             AND  start_time < SYSDATE - INTERVAL '1' HOUR
             THEN error
             END
       ) AS total_errors_today,
       sum(
        CASE WHEN SYSDATE - INTERVAL '36' HOUR <= start_time
             AND  start_time < SYSDATE - INTERVAL '25' HOUR
             THEN success
             END
       ) AS total_successes_yesterday,
       sum(
        CASE WHEN SYSDATE - INTERVAL '180' HOUR <= start_time
             AND  start_time < SYSDATE - INTERVAL '169' HOUR
             THEN success
             END
       ) AS total_successes_last_week
FROM   table_1
WHERE  (    SYSDATE - INTERVAL '12' HOUR <= start_time
        AND start_time < SYSDATE - INTERVAL '1' HOUR ) -- today
OR     (    SYSDATE - INTERVAL '36' HOUR <= start_time
        AND start_time < SYSDATE - INTERVAL '25' HOUR ) -- yesterday = today + 24 hours
OR     (    SYSDATE - INTERVAL '180' HOUR <= start_time
        AND start_time < SYSDATE - INTERVAL '169' HOUR ) -- last week = today + 7*24 hours

However, if you do want to keep the join on the start- and end-times then you could use an analytic query: 但是,如果您确实希望保持连接的开始时间和结束时间,则可以使用分析查询:

SELECT order_line,
       zone,
       network,
       SUM( success ),
       SUM( error ),
       SUM( yesterday_success ),
       SUM( last_week_success )
FROM   (
  SELECT t.*,
         SUM( success )
           OVER ( PARTITION BY id,
                               TO_CHAR( start_time, 'HH24:MI:SS' ),
                               TO_CHAR( end_time, 'HH24:MI:SS' )
                  ORDER BY start_time
                  RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY PRECEDING
                ) AS yesterday_success,
         SUM( success )
           OVER ( PARTITION BY id,
                               TO_CHAR( start_time, 'HH24:MI:SS' ),
                               TO_CHAR( end_time, 'HH24:MI:SS' )
                  ORDER BY start_time
                  RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND INTERVAL '7' DAY PRECEDING
                ) AS last_week_success
  FROM   TABLE_1 t
)
WHERE  SYSDATE - INTERVAL '12' HOUR <= start_time 
AND    start_time < SYSDATE - INTERVAL '1' HOUR
GROUP BY
       order_line,
       zone,
       network
ORDER BY
       order_line,
       zone,
       network

You can look at whether you would get speed improvements by using function-based indexes on TO_CHAR( start_time, 'HH24:MI:SS' ) and TO_CHAR( end_time, 'HH24:MI:SS' ) . 您可以查看是否可以通过在TO_CHAR( start_time, 'HH24:MI:SS' )TO_CHAR( end_time, 'HH24:MI:SS' )上使用基于函数的索引来提高速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM