[英]How to improve performance of a JOIN of two SCD2 tables in Oracle SQL
I have two tables, both using valid to and valid from logic. 我有两个表,都使用有效和逻辑有效。 Table 1 looks like this: 表1看起来像这样:
ID | VALID_FROM | VALID_TO
1 | 01.01.2000 | 04.01.2000
1 | 04.01.2000 | 16.01.2000
1 | 16.01.2000 | 17.01.2000
1 | 17.01.2000 | 19.01.2000
2 | 03.02.2001 | 04.04.2001
2 | 04.04.2001 | 14.03.2001
2 | 14.04.2001 | 18.03.2001
while table 2 looks like this: 而表2看起来像这样:
ID | VAR | VALID_FROM | VALID_TO
1 | 3 | 01.01.2000 | 17.01.2000
1 | 2 | 17.01.2000 | 19.01.2000
2 | 4 | 03.02.2001 | 14.03.2001
select t1.*,
t2.var
from t1 t1
inner join t2 t2
on t1.id = t2.id
and t1.valid_from >= t2.valid_from
and t1.valid_to <= t2.valid_to;
This join is really slow. 这种联接真的很慢。 I ran it half a day without any success. 我跑了半天没有成功。 What can I do to increase performance in this particular case? 在这种特殊情况下,我该怎么做才能提高性能? Please note that I also want to left join the resulting table in later stages. 请注意,我还想在以后的阶段中加入生成的表格。 Any help is highly appreciated. 任何帮助都非常感谢。
EDIT 编辑
Obviously, the information I gave was less then generally desired here on the platform. 显然,我给出的信息在平台上通常不太普遍。
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 465M| 23G| | 435K (3)| 00:00:18 |
|* 1 | HASH JOIN | | 465M| 23G| 695M| 435K (3)| 00:00:18 |
| 2 | TABLE ACCESS FULL| TABLE2 | 16M| 501M| | 22961 (2)| 00:00:01 |
| 3 | TABLE ACCESS FULL| TABLE1 | 132M| 3025M| | 145K (2)| 00:00:06 |
--------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$58A6D7F6
2 - SEL$58A6D7F6 / T2@SEL$1
3 - SEL$58A6D7F6 / T1@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
filter("T1"."VALID_TO"<="T2"."VALID_TO" AND
"T1"."VALID_FROM">="T2"."VALID_FROM")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) "T2"."ID"[VARCHAR2,20],
"T1"."ID"[VARCHAR2,20], "T1"."VALID_TO"[DATE,7],
"T2"."VAR"[VARCHAR2,20], "T2"."VALID_FROM"[DATE,7],
"T2"."VALID_TO"[DATE,7], "T1"."ID"[VARCHAR2,20],
"T1"."VALID_FROM"[DATE,7], "T1"."VALID_TO"[DATE,7], "T1"."VALID_FROM"[DATE,7]
2 - "T2"."ID"[VARCHAR2,20],
"T2"."VAR"[VARCHAR2,20], "T2"."VALID_FROM"[DATE,7],
"T2"."VALID_TO"[DATE,7]
3 - "T1"."ID"[VARCHAR2,20], "T1"."VALID_FROM"[DATE,7],
"T1"."VALID_TO"[DATE,7]
Note
-----
- this is an adaptive plan
A good practice is to ask first: what is expected the query will return? 一个好的做法是首先询问: 查询将返回什么?
Base on your WHERE
predicate is seems you are interested on all versions from table2 that are included in the validity interval of table1. 根据您的WHERE
谓词,似乎您对table2中包含在table1的有效性间隔中的所有版本感兴趣。 This may be intention, but more common you need all versions that intersect between the tables. 这可能是有意的,但更常见的是您需要在表之间相交的所有版本。
The second aspect is, do you need to see few first rows or all rows from the join. 第二个方面是,您是否需要查看连接中的少数第一行或所有行 。
If you only want to see few results, simple add AND t1.ID = nnnn
to the WHERE clause to limit to some sample ID
. 如果您只想看到很少的结果, AND t1.ID = nnnn
在WHERE子句中添加AND t1.ID = nnnn
即可限制某些样本ID
。 If you have proper indexes (and tehre are no expreme lot of rows with this ID), you will get the result quick as NESTED LOOP join will kick in. 如果你有适当的索引(并且tehre没有带有这个ID的最多行),你将获得快速结果,因为NESTED LOOP加入将启动。
To perform the the full result, you must consider all rows from both tables. 要执行完整结果,必须考虑两个表中的所有行 。 No index will help you to select all rows from a table - here is the FULL TABLE SCAN the best option. 没有索引可以帮助您从表中选择所有行 - 这里是FULL TABLE SCAN的最佳选择。
To join the large row sets the best approach is HASH JOIN
. 要加入HASH JOIN
集,最好的方法是HASH JOIN
。 NESTED LOOPS
(which you probably use now) are quick to join few rows, but hangs on large row sets. NESTED LOOPS
(您现在可能会使用它)可以快速连接几行,但挂在大型行集上。
The smaller table (table2) is red in memory (hopefully) as a hash table. 较小的表(table2)在内存中是红色的(希望)作为哈希表。 The larger table (table1) is probed against this hash table toperform the join. 针对此哈希表探测较大的表(table1)以执行连接。
This is the execution plan you should look for 这是您应该寻找的执行计划
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10T| 399T| | 190M(100)| 02:03:47 |
|* 1 | HASH JOIN | | 10T| 399T| 550M| 190M(100)| 02:03:47 |
| 2 | TABLE ACCESS FULL| SCD2 | 16M| 355M| | 39 (93)| 00:00:01 |
| 3 | TABLE ACCESS FULL| SCD1 | 132M| 2395M| | 211 (99)| 00:00:01 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
filter("T1"."VALID_FROM">="T2"."VALID_FROM" AND
"T1"."VALID_TO"<="T2"."VALID_TO")
Provided you are on an enterprise database this should pass you from days to hours . 如果您在企业数据库中,这应该会让您从几天到几小时 。 Further you can deploy parallel option to get additional speed up. 此外,您可以部署并行选项以获得额外的加速。
Good luck! 祝好运!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.