[英]Join elimination not working in Oracle with sub queries
我能夠將連接消除工作用於簡單的情況,例如一對一的關系,但不能用於稍微復雜的場景。 最后我想嘗試錨建模,但首先我需要找到解決這個問題的方法。 我正在使用Oracle 12c企業版第12.1.0.2.0版。
我的測試用例的DDL:
drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product cascade constraints;
create table product(
product_id number not null
,constraint product_pk primary key(product_id)
);
create table product_color(
product_id number not null references product
,color varchar2(10) not null
,constraint product_color_pk primary key(product_id)
);
create table product_price(
product_id number not null references product
,from_date date not null
,price number not null
,constraint product_price_pk primary key(product_id, from_date)
);
一些示例數據:
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);
insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');
insert into product_price values(1, date '2016-01-01', 10);
insert into product_price values(1, date '2016-02-01', 8);
insert into product_price values(1, date '2016-05-01', 5);
insert into product_price values(2, date '2016-02-01', 5);
insert into product_price values(4, date '2016-01-01', 10);
commit;
第一個視圖不編譯 - 它與ORA-01799失敗: 列可能不是外部連接到子查詢 。 不幸的是,當我查看錨建模的在線示例時,這就是大多數歷史視圖的定義...
create view product_5nf as
select p.product_id
,pc.color
,pp.price
from product p
left join product_color pc on(
pc.product_id = p.product_id
)
left join product_price pp on(
pp.product_id = p.product_id
and pp.from_date = (select max(pp2.from_date)
from product_price pp2
where pp2.product_id = pp.product_id)
);
以下是我修復它的嘗試。 通過簡單選擇product_id
來使用此視圖時,Oracle設法消除product_color而不是 product_price。
create view product_5nf as
select product_id
,pc.color
,pp.price
from product p
left join product_color pc using(product_id)
left join (select pp1.product_id, pp1.price
from product_price pp1
where pp1.from_date = (select max(pp2.from_date)
from product_price pp2
where pp2.product_id = pp1.product_id)
)pp using(product_id);
select product_id
from product_5nf;
----------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 |
|* 1 | HASH JOIN OUTER | | 4 |
| 2 | INDEX FAST FULL SCAN| PRODUCT_PK | 4 |
| 3 | VIEW | | 3 |
| 4 | NESTED LOOPS | | 3 |
| 5 | VIEW | VW_SQ_1 | 5 |
| 6 | HASH GROUP BY | | 5 |
| 7 | INDEX FULL SCAN | PRODUCT_PRICE_PK | 5 |
|* 8 | INDEX UNIQUE SCAN | PRODUCT_PRICE_PK | 1 |
----------------------------------------------------------
我發現的唯一解決方案是使用標量子查詢,如下所示:
create or replace view product_5nf as
select p.product_id
,pc.color
,(select pp.price
from product_price pp
where pp.product_id = p.product_id
and pp.from_date = (select max(from_date)
from product_price pp2
where pp2.product_id = pp.product_id)) as price
from product p
left join product_color pc on(
pc.product_id = p.product_id
)
select product_id
from product_5nf;
---------------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------------
| 0 | SELECT STATEMENT | | 4 |
| 1 | INDEX FAST FULL SCAN| PRODUCT_PK | 4 |
---------------------------------------------------
現在Oracle成功地刪除了product_price表。 但是,標量子查詢的實現方式與連接不同,執行它們的方式根本不允許我在現實場景中獲得任何可接受的性能。
TL; DR如何重寫視圖product_5nf
以便Oracle成功地消除了兩個依賴表?
我想你在這里遇到兩個問題。
首先,連接消除僅適用於特定情況(PK-PK,PK-FK等)。 一般情況下,您可以將LEFT JOIN
到任何行集,該行集將為每個連接鍵值返回單行並讓Oracle取消連接。
其次,即使Oracle已經足夠先進,可以在任何LEFT JOIN
上進行連接消除,它知道每個連接鍵值只能獲得一行,但Oracle還不支持基於復合鍵的LEFT JOINS
上的連接消除(Oracle支持)文件887553.1說這是在R12.2中。
您可以考慮的一種解決方法是使用每個product_id
的最后一行實現視圖。 然后LEFT JOIN
到物化視圖。 像這樣:
create table product(
product_id number not null
,constraint product_pk primary key(product_id)
);
create table product_color(
product_id number not null references product
,color varchar2(10) not null
,constraint product_color_pk primary key(product_id)
);
create table product_price(
product_id number not null references product
,from_date date not null
,price number not null
,constraint product_price_pk primary key (product_id, from_date )
);
-- Add a VIRTUAL column to PRODUCT_PRICE so that we can get all the data for
-- the latest row by taking the MAX() of this column.
alter table product_price add ( sortable_row varchar2(80) generated always as ( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) virtual not null );
-- Create a MV snapshot so we can materialize a view having only the latest
-- row for each product_id and can refresh that MV fast on commit.
create materialized view log on product_price with sequence, primary key, rowid ( price ) including new values;
-- Create the MV
create materialized view product_price_latest refresh fast on commit enable query rewrite as
SELECT product_id, max( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) sortable_row
FROM product_price
GROUP BY product_id;
-- Create a primary key on the MV, so we can do join elimination
alter table product_price_latest add constraint ppl_pk primary key ( product_id );
-- Insert the OP's test data
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);
insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');
insert into product_price ( product_id, from_date, price ) values(1, date '2016-01-01', 10 );
insert into product_price ( product_id, from_date, price) values(1, date '2016-02-01', 8);
insert into product_price ( product_id, from_date, price) values(1, date '2016-05-01', 5);
insert into product_price ( product_id, from_date, price) values(2, date '2016-02-01', 5);
insert into product_price ( product_id, from_date, price) values(4, date '2016-01-01', 10);
commit;
-- Create the 5NF view using the materialized view
create or replace view product_5nf as
select p.product_id
,pc.color
,to_date(substr(ppl.sortable_row,11,14),'YYYYMMDDHH24MISS') from_date
,to_number(substr(ppl.sortable_row,25)) price
from product p
left join product_color pc on pc.product_id = p.product_id
left join product_price_latest ppl on ppl.product_id = p.product_id
;
-- The plan for this should not include any of the unnecessary tables.
select product_id from product_5nf;
-- Check the plan
SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (null, null,
'ALLSTATS LAST'));
------------------------------------------------
| Id | Operation | Name | E-Rows |
------------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | INDEX FULL SCAN | PRODUCT_PK | 1 |
------------------------------------------------
我無法取消價格加入,但如果您執行以下操作,它至少可以減少對單個索引的訪問以進行價格檢查:
CREATE OR REPLACE view product_5nf as
select p.product_id
,pc.color
,pp.price
from product p
left join product_color pc ON p.product_id = pc.product_id
left join (select pp1.product_id, pp1.price
from (SELECT product_id,
price,
from_date,
max(from_date) OVER (PARTITION BY product_id) max_from_date
FROM product_price) pp1
where pp1.from_date = max_from_date) pp ON p.product_id = pp.product_id;
現在Oracle成功地刪除了product_price表。 但是,標量子查詢的實現方式與連接不同,執行它們的方式根本不允許我在現實場景中獲得任何可接受的性能。
Oracle 12.1中基於成本的優化器可以對不需要的標量子查詢執行查詢轉換。 因此,性能可能與您在問題中的LEFT JOIN
一樣好。
訣竅是你必須稍微搖晃它。
首先,確保標量子查詢返回沒有group by
max()
,因此CBO知道不可能獲得多行。 (否則不會消除)。
其次,您需要將product_price
所有字段組合到單個標量子查詢中,否則CBO將不再需要多次加入product_price
。
以下是Oracle 12.1的測試用例,說明了這一點。
drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product cascade constraints;
create table product(
product_id number not null
,constraint product_pk primary key(product_id)
);
create table product_color(
product_id number not null references product
,color varchar2(10) not null
,constraint product_color_pk primary key(product_id)
);
create table product_price(
product_id number not null references product
,from_date date not null
,price number not null
,constraint product_price_pk primary key (product_id, from_date )
);
insert into product ( product_id ) SELECT rownum FROM dual connect by rownum <= 100000;
insert into product_color ( product_id, color ) SELECT rownum, dbms_random.string('a',8) color FROM DUAL connect by rownum <= 100000;
--delete from product_price;
insert into product_price ( product_id, from_date, price ) SELECT product_id, trunc(sysdate) + dbms_random.value(-3,3) from_date, floor(dbms_random.value(50,120)/10)*10 price from product cross join lateral ( SELECT rownum x FROM dual connect by rownum <= mod(product_id,5));
commit;
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT' ); end;
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_COLOR' ); end;
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_PRICE' ); end;
commit;
alter table product_price add ( composite_column varchar2(80) generated always as ( to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,0)) virtual );
create or replace view product_5nf as
select d.product_id, d.color, to_date(substr(d.product_date_price,1,14),'YYYYMMDDHH24MISS') from_date, to_number(substr(d.product_date_price,-10)) price
from
( select p.product_id
,pc.color
,( SELECT max(composite_column) FROM product_price pp WHERE pp.product_id = p.product_id AND pp.from_date = ( SELECT max(pp2.from_date) FROM product_price pp2 WHERE pp2.product_id = pp.product_id ) ) product_date_price
from product p
left join product_color pc on pc.product_id = p.product_id ) d
;
select product_id from product_5nf;
----------------------------------------------
| Id | Operation | Name | E-Rows |
----------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | TABLE ACCESS FULL| PRODUCT | 100K|
----------------------------------------------
select * from product_5nf;
SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (null, null,
'ALLSTATS LAST'));
--------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
|* 1 | HASH JOIN RIGHT OUTER | | 100K| 8387K| 3159K| 8835K (0)|
| 2 | VIEW | VW_SSQ_2 | 2 | | | |
| 3 | HASH GROUP BY | | 2 | 13M| 2332K| 12M (0)|
| 4 | VIEW | VM_NWVW_3 | 2 | | | |
|* 5 | FILTER | | | | | |
| 6 | HASH GROUP BY | | 2 | 23M| 5055K| 20M (0)|
|* 7 | HASH JOIN | | 480K| 12M| 4262K| 17M (0)|
| 8 | TABLE ACCESS FULL| PRODUCT_PRICE | 220K| | | |
| 9 | TABLE ACCESS FULL| PRODUCT_PRICE | 220K| | | |
|* 10 | HASH JOIN OUTER | | 100K| 5918K| 3056K| 5847K (0)|
| 11 | TABLE ACCESS FULL | PRODUCT | 100K| | | |
| 12 | TABLE ACCESS FULL | PRODUCT_COLOR | 100K| | | |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ITEM_2"="P"."PRODUCT_ID")
5 - filter("PP"."FROM_DATE"=MAX("PP2"."FROM_DATE"))
7 - access("PP2"."PRODUCT_ID"="PP"."PRODUCT_ID")
10 - access("PC"."PRODUCT_ID"="P"."PRODUCT_ID")
好的,我正在回答我自己的問題。 本答案中的信息適用於Oracle Database 12c企業版12.1.0.2.0版 - 64位生產 ,但可能不適用於更高版本。 不要投票給這個答案,因為它沒有回答這個問題。
由於當前版本的特定限制(如Mathew McPeak所述),根本不可能讓Oracle完全消除底層5NF視圖中不必要的連接。 限制是在基於復合鍵的左連接上不能進行連接消除 。
任何解決此限制的嘗試似乎都會引入重復或更新異常。 接受的答案演示了如何通過使用物化視圖從而復制數據來克服優化器中的此限制。 這個答案顯示了如何解決問題,減少重復,但更新異常。
此解決方法基於以下事實:您可以在唯一索引中使用可為空的列。 我們將為所有歷史版本添加null
,並為使用外鍵引用product表的最新版本的實際product_id
。
alter table product_price add(
latest_id number
,constraint product_price_uk unique(latest_id)
,constraint product_price_fk2 foreign key(latest_id) references product(product_id)
,constraint product_price_chk check(latest_id = product_id)
);
-- One-time update of existing data
update product_price a
set a.latest_id = a.product_id
where from_date = (select max(from_date)
from product_price b
where a.product_id = b.product_id);
PRODUCT_ID FROM_DATE PRICE LATEST_ID
---------- ---------- ---------- ----------
1 2016-01-01 10 null
1 2016-02-01 8 null
1 2016-05-01 5 1
2 2016-02-01 5 2
4 2016-01-01 10 4
-- New view definition
create or replace view product_5nf as
select p.product_id
,pc.color
,pp.price
from product p
left join product_color pc on(pc.product_id = p.product_id)
left join product_price pp on(pp.latest_id = p.product_id);
當然,現在必須手動維護latest_id
...每當插入新記錄時,必須首先使用null更新舊記錄。
這種方法有兩個好處。 首先,Oracle能夠完全刪除不必要的連接。 其次,連接不是作為標量子查詢執行的。
SQL> select count(*) from product_5nf;
---------------------------------------
| Id | Operation | Name |
---------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | INDEX FULL SCAN| PRODUCT_PK |
---------------------------------------
Oracle認識到可以在不觸及基表的情況下解析計數。 沒有不必要的聯接可見......
SQL> select product_id, price from product_5nf;
---------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN OUTER | |
| 2 | INDEX FULL SCAN | PRODUCT_PK |
| 3 | TABLE ACCESS BY INDEX ROWID| PRODUCT_PRICE |
|* 4 | INDEX FULL SCAN | PRODUCT_PRICE_UK |
---------------------------------------------------------
Oracle認識到我們必須加入product_price
才能獲得價格列。 而且product_color
無處可見......
SQL> select * from product_5nf;
----------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN OUTER | |
| 2 | NESTED LOOPS OUTER | |
| 3 | INDEX FULL SCAN | PRODUCT_PK |
| 4 | TABLE ACCESS BY INDEX ROWID| PRODUCT_COLOR |
|* 5 | INDEX UNIQUE SCAN | PRODUCT_COLOR_PK |
| 6 | TABLE ACCESS BY INDEX ROWID | PRODUCT_PRICE |
|* 7 | INDEX FULL SCAN | PRODUCT_PRICE_UK |
----------------------------------------------------------
這里Oracle必須實現所有連接,因為所有列都被引用。
[我不知道ANTI-JOIN是否算作Oracle中的子查詢],但not exists
技巧通常是一種避免聚合子查詢的方法:
CREATE VIEW product_5nfa as
SELECT p.product_id
,pc.color
,pp.price
FROM product p
LEFT JOIN product_color pc
ON pc.product_id = p.product_id
LEFT join product_price pp
ON pp.product_id = p.product_id
AND NOT EXISTS ( SELECT * FROM product_price pp2
WHERE pp2.product_id = pp.product_id
AND pp2.from_date > pp.from_date
)
;
OP的評論:視圖已創建,但Oracle仍無法刪除該連接。 這是執行計划。
select count(*) from product_5nfa;
-------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | NESTED LOOPS OUTER | |
| 3 | INDEX FULL SCAN | PRODUCT_PK |
| 4 | VIEW | |
| 5 | NESTED LOOPS ANTI| |
|* 6 | INDEX RANGE SCAN| PRODUCT_PRICE_PK |
|* 7 | INDEX RANGE SCAN| PRODUCT_PRICE_PK |
-------------------------------------------------
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.