[英]Calcualte Min(from) Max(to) Date Per Group in a Dataset using SQL in Oracle DB
I have following initial dataset 我有以下初始数据集
F_ID L_CAT CHG_DT
F1 VHL 01-FEB-2016
F1 VHL 10-FEB-2016
F1 VHL 15-FEB-2016
F1 MHL 20-FEB-2016
F1 VHL 25-FEB-2016
F1 VHL 28-FEB-2016
F1 MHL 05-MAR-2016
F1 MHL 10-MAR-2016
F2 VHL 01-FEB-2016
F2 VHL 10-FEB-2016
F2 MHL 18-FEB-2016
F2 MHL 21-FEB-2016
F2 VHL 25-FEB-2016
and want to generate following output using SQL Query in oracle DB 并希望在oracle DB中使用SQL查询生成以下输出
F_ID L_CAT FROM_DT TO_DT
F1 VHL 01-FEB-2016 20-FEB-2016
F1 MHL 20-FEB-2016 25-FEB-2016
F1 VHL 25-FEB-2016 05-MAR-2016
F1 MHL 05-MAR-2016 10-MAR-2016
F2 VHL 01-FEB-2016 18-FEB-2016
F2 MHL 18-FEB-2016 25-FEB-2016
F2 VHL 25-FEB-2016 25-FEB-2016
In other words, I want to calculate time span during which each F_ID remain in specific L_CAT. 换句话说,我想计算每个F_ID保留在特定L_CAT中的时间跨度。 I am using Oracle 11g.
我正在使用Oracle 11g。 Any lead towards the solution is highly appreciated.
任何解决方案的线索都受到高度赞赏。 Thanks
谢谢
Code to produce the scenario is given under: 产生场景的代码在下面给出:
create table my_test
(
f_id varchar2(30),
l_cat varchar2(30),
chg_dt date
);
insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','01-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','10-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','15-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','MHL','20-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','25-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','VHL','28-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','MHL','05-MAR-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F1','MHL','10-MAR-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','VHL','01-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','VHL','10-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','MHL','18-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','MHL','21-FEB-2016');
insert into my_test(f_id, l_cat, chg_dt) values ('F2','VHL','25-FEB-2016');
COMMIT;
This specific problem is called "gaps and islands". 这个特定的问题称为“间隙和孤岛”。 One method uses the difference of row numbers:
一种方法使用行号的不同之处:
select f_id, l_cat, min(chg_dt), max(chg_dt)
from (select i.*,
row_number() over (partition by f_id order by chg_dt) as seqnum_i,
row_number() over (partition by f_id, l_cat order by chg_dt) as seqnum_ic
from initial i
) i
group by f_id, l_cat, (seqnum_i - seqnum_ic);
Explaining how this works is challenging. 解释其工作方式具有挑战性。 But if you stare at the results from the subquery, you can see how the difference in row numbers defines the groups you want.
但是,如果您盯着子查询的结果,则可以看到行号上的差异如何定义所需的组。
First of all, thank you for test case you provided! 首先,感谢您提供的测试用例!
How about this? 这个怎么样?
SQL> with inter as
2 (select f_id, l_cat, chg_dt,
3 lead(chg_dt) over (partition by f_id order by chg_dt) lead_dt,
4 case when lag(l_cat, 1, 1) over (order by f_id, chg_dt) <> l_cat
5 then 1
6 end sgrp -- group rows per L_CAT changes
7 from my_test
8 ),
9 inter_2 as
10 (select f_id, l_cat, chg_dt, lead_dt,
11 sum(sgrp) over (order by f_id, chg_dt) grp -- groups
12 from inter
13 )
14 select f_id, l_cat,
15 min(chg_dt) from_dt,
16 nvl(max(lead_dt), min(chg_dt)) to_dt
17 from inter_2
18 group by f_id, l_cat, grp
19 order by 1, 3;
F_ID L_CAT FROM_DT TO_DT
----- ---------- ----------- -----------
F1 VHL 01-feb-2016 20-feb-2016
F1 MHL 20-feb-2016 25-feb-2016
F1 VHL 25-feb-2016 05-mar-2016
F1 MHL 05-mar-2016 10-mar-2016
F2 VHL 01-feb-2016 18-feb-2016
F2 MHL 18-feb-2016 25-feb-2016
F2 VHL 25-feb-2016 25-feb-2016
7 rows selected.
SQL>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.