[英]Oracle: select missing dates
我在字段中有一個帶有(除其他外)日期的表格。
我需要獲取所有日期的列表,這些日期比最舊的日期更近,比最近的日期更早,並且完全從表中丟失。
因此,如果該表包含:
2012-01-02
2012-01-02
2012-01-03
2012-01-05
2012-01-05
2012-01-07
2012-01-08
我想要一個返回的查詢:
2012-01-04
2012-01-06
這樣的事情(假設你的表名為your_table
,日期列名為the_date
):
with date_range as (
select min(the_date) as oldest,
max(the_date) as recent,
max(the_date) - min(the_date) as total_days
from your_table
),
all_dates as (
select oldest + level - 1 as a_date
from date_range
connect by level <= (select total_days from date_range)
)
select ad.a_date
from all_dates ad
left join your_table yt on ad.a_date = yt.the_date
where yt.the_date is null
order by ad.a_date;
編輯:
WITH
子句稱為“公用表表達式”,相當於派生表(“內聯視圖”)。
它類似於
select *
from (
.....
) all_dates
join your_table ...
第二個CTE使用Oracle實現的connect by
的未記錄功能,簡單地“動態”創建日期列表。
重復使用select(就像我計算第一個和最后一個日期一樣)比使用派生表更容易(和IMHO更易讀)。
編輯2:
這也可以通過遞歸CTE完成:
with date_range as (
select min(the_date) as oldest,
max(the_date) as recent,
max(the_date) - min(the_date) as total_days
from your_table
),
all_dates (a_date, lvl) as (
select oldest as a_date, 1 as lvl
from date_range
union all
select (select oldest from date_range) + lvl, lvl + 1
from all_dates
where lvl < (select total_days from date_range)
)
select ad.a_date, lvl
from all_dates ad
left join your_table yt on ad.a_date = yt.the_date
where yt.the_date is null
order by ad.a_date;
哪個應該適用於支持遞歸CTE的所有DBMS(PostgreSQL和Firebird - 更符合標准 - 盡管需要recursive
關鍵字)。
注意在遞歸部分中的hack select (select oldest from date_range) + lvl, lvl + 1
。 這不應該是必要的,但Oracle在遞歸CTE中仍有一些關於DATE的錯誤。 在PostgreSQL中,以下工作沒有問題:
....
all_dates (a_date, lvl) as (
select oldest as a_date, 0 as lvl
from date_range
union all
select a_date + 1, lvl + 1
from all_dates
where lvl < (select total_days from date_range)
)
....
我們可以使用簡單的分層查詢,如下所示:
WITH CTE AS
(SELECT (SELECT MIN(COL1) FROM T)+LEVEL-1 AS OUT FROM DUAL
CONNECT BY (LEVEL-1) <= (SELECT MAX(COL1) - MIN(COL1) FROM T))
SELECT OUT FROM CTE WHERE OUT NOT IN (SELECT COL1 FROM T);
我選擇這個變種,因為它更有效:
with all_dates_wo_boundary_values as
( select oldest + level the_date
from ( select min(the_date) oldest
, max(the_date) recent
from your_table
)
connect by level <= recent - oldest - 1
)
select the_date
from all_dates_wo_boundary_values
minus
select the_date
from your_table
這里有一些證據。
首先是設置:
SQL> create table your_table (the_date)
2 as
3 select date '2012-01-02' from dual union all
4 select date '2012-01-02' from dual union all
5 select date '2012-01-03' from dual union all
6 select date '2012-01-05' from dual union all
7 select date '2012-01-05' from dual union all
8 select date '2012-01-07' from dual union all
9 select date '2012-01-08' from dual
10 /
Table created.
SQL> exec dbms_stats.gather_table_stats(user,'your_table')
PL/SQL procedure successfully completed.
SQL> alter session set statistics_level = all
2 /
Session altered.
馬的查詢:
SQL> with date_range as
2 ( select min(the_date) as oldest
3 , max(the_date) as recent
4 , max(the_date) - min(the_date) as total_days
5 from your_table
6 )
7 , all_dates as
8 ( select ( select oldest from date_range) + level as a_date
9 from dual
10 connect by level <= (select total_days from date_range)
11 )
12 select ad.a_date
13 from all_dates ad
14 left join your_table yt on ad.a_date = yt.the_date
15 where yt.the_date is null
16 order by ad.a_date
17 /
A_DATE
-------------------
04-01-2012 00:00:00
06-01-2012 00:00:00
2 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gaqx49vb9gz9k, child number 0
-------------------------------------
with date_range as ( select min(the_date) as oldest , max(the_date) as recent , max(the_date) - min(the_date) as total_d
ays from your_table )
, all_dates as ( select ( select oldest from date_range) + level as a_date from dual connect by level <= (select total_days from
date_range) ) select
ad.a_date from all_dates ad left join your_table yt on ad.a_date = yt.the_date where yt.the_date is null order by ad.a_date
Plan hash value: 1419150012
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | TEMP TABLE TRANSFORMATION | | 1 | | 2 |00:00:00.01 | 22 | 1 | 1 | | | |
| 2 | LOAD AS SELECT | | 1 | | 1 |00:00:00.01 | 7 | 0 | 1 | 262K| 262K| 262K (0)|
| 3 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | 0 | 0 | | | |
| 4 | TABLE ACCESS FULL | YOUR_TABLE | 1 | 7 | 7 |00:00:00.01 | 3 | 0 | 0 | | | |
| 5 | SORT ORDER BY | | 1 | 1 | 2 |00:00:00.01 | 12 | 1 | 0 | 2048 | 2048 | 2048 (0)|
|* 6 | FILTER | | 1 | | 2 |00:00:00.01 | 12 | 1 | 0 | | | |
|* 7 | HASH JOIN OUTER | | 1 | 1 | 7 |00:00:00.01 | 12 | 1 | 0 | 1048K| 1048K| 707K (0)|
| 8 | VIEW | | 1 | 1 | 6 |00:00:00.01 | 9 | 1 | 0 | | | |
| 9 | CONNECT BY WITHOUT FILTERING| | 1 | | 6 |00:00:00.01 | 3 | 0 | 0 | | | |
| 10 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | 0 | 0 | | | |
| 11 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 3 | 0 | 0 | | | |
| 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660C_81240964 | 1 | 1 | 1 |00:00:00.01 | 3 | 0 | 0 | | | |
| 13 | TABLE ACCESS FULL | YOUR_TABLE | 1 | 7 | 7 |00:00:00.01 | 3 | 0 | 0 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - filter("YT"."THE_DATE" IS NULL)
7 - access("YT"."THE_DATE"=INTERNAL_FUNCTION("AD"."A_DATE"))
32 rows selected.
我的建議是:
SQL> with all_dates_wo_boundary_values as
2 ( select oldest + level the_date
3 from ( select min(the_date) oldest
4 , max(the_date) recent
5 from your_table
6 )
7 connect by level <= recent - oldest - 1
8 )
9 select the_date
10 from all_dates_wo_boundary_values
11 minus
12 select the_date
13 from your_table
14 /
THE_DATE
-------------------
04-01-2012 00:00:00
06-01-2012 00:00:00
2 rows selected.
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
2 /
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 7aavxmzkj7zq7, child number 0
-------------------------------------
with all_dates_wo_boundary_values as ( select oldest + level the_date from ( select min(the_date) oldest
, max(the_date) recent from your_table ) connect by level <= recent - oldest - 1 ) select
the_date from all_dates_wo_boundary_values minus select the_date from your_table
Plan hash value: 2293301832
-----------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
| 1 | MINUS | | 1 | | 2 |00:00:00.01 | 6 | | | |
| 2 | SORT UNIQUE | | 1 | 1 | 5 |00:00:00.01 | 3 | 9216 | 9216 | 8192 (0)|
| 3 | VIEW | | 1 | 1 | 5 |00:00:00.01 | 3 | | | |
| 4 | CONNECT BY WITHOUT FILTERING| | 1 | | 5 |00:00:00.01 | 3 | | | |
| 5 | VIEW | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 6 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
| 7 | TABLE ACCESS FULL | YOUR_TABLE | 1 | 7 | 7 |00:00:00.01 | 3 | | | |
| 8 | SORT UNIQUE | | 1 | 7 | 5 |00:00:00.01 | 3 | 9216 | 9216 | 8192 (0)|
| 9 | TABLE ACCESS FULL | YOUR_TABLE | 1 | 7 | 7 |00:00:00.01 | 3 | | | |
-----------------------------------------------------------------------------------------------------------------------------------
22 rows selected.
問候,
搶。
您不需要生成所有日期,然后使用MINUS
(或反連接)來刪除現有行(這會很慢)。
您可以使用LEAD
分析 function 來查找下一個日期,然后使用CROSS JOIN LATERAL
(可從 Oracle 12 獲得)加入行生成器以僅生成缺失的日期:
SELECT m.missing
FROM (
SELECT dt,
LEAD(dt) OVER (ORDER BY dt) AS next_dt
FROM table_name
) t
CROSS JOIN LATERAL (
SELECT dt + LEVEL AS missing
FROM DUAL
WHERE dt + 1 < next_dt
CONNECT BY dt + LEVEL < next_dt
) m
其中,對於樣本數據:
CREATE TABLE table_name (dt) AS
SELECT DATE '2012-01-02' FROM DUAL UNION ALL
SELECT DATE '2012-01-02' FROM DUAL UNION ALL
SELECT DATE '2012-01-03' FROM DUAL UNION ALL
SELECT DATE '2012-01-05' FROM DUAL UNION ALL
SELECT DATE '2012-01-05' FROM DUAL UNION ALL
SELECT DATE '2012-01-07' FROM DUAL UNION ALL
SELECT DATE '2012-01-08' FROM DUAL;
輸出:
失蹤 2012-01-04 00:00:00 2012-01-06 00:00:00
db<>在這里擺弄
您需要一個Calendar
表(永久表或動態創建)。 然后你可以做一個簡單的事情:
SELECT c.my_date
FROM
calendar c
JOIN
( SELECT MIN(date_column) AS min_date
, MAX(date_column) AS max_date
FROM tableX
) mm
ON c.mydate BETWEEN min_date AND max_date
WHERE
c.my_date NOT IN
( SELECT date_column
FROM tableX
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.