[英]How to get min max date per row type
场景:一家公司在许多州设有许多分支机构。 一个州可能有多个分支。 每当员工从一个分支转移到另一个分支时,都会对表进行如下输入:
| EID | DT | BRANCH | STATE |
|-----|-------------|--------|-------|
| 1 | 01-JAN-2000 | A | AA |
| 1 | 01-JAN-2001 | B | AA |
| 1 | 01-JAN-2002 | C | AA |
| 1 | 01-JAN-2003 | D | AA |
| 1 | 01-JAN-2004 | E | BB |
| 1 | 01-JAN-2005 | F | BB |
| 1 | 01-JAN-2006 | G | BB |
| 1 | 01-JAN-2007 | H | BB |
| 1 | 01-JAN-2008 | A | AA |
| 1 | 01-JAN-2009 | B | AA |
| 1 | 01-JAN-2010 | C | AA |
| 1 | 01-JAN-2011 | D | AA |
要求是找出员工处于某种状态的持续时间。 输出应该是这样的
| STATE | MIN | MAX | Duration |
|-------|-------------|-------------|-------------|
| AA | 01-JAN-2000 | 01-JAN-2003 | 3 |
| BB | 01-JAN-2004 | 01-JAN-2007 | 3 |
| AA | 01-JAN-2008 | 01-JAN-2011 | 3 |
我似乎无法弄清楚如何在PL / SQL中做到这一点。 长的方法是使用for循环遍历每一行并找到持续时间。 但是有没有办法在PLSQL中不使用循环呢?
这是一个SQLFiddle 演示
WITH groups AS (
SELECT
t1.*,
ROW_NUMBER() OVER ( ORDER BY dt )
- ROW_NUMBER() OVER ( PARTITION BY state ORDER BY dt ) AS grp
FROM t1
)
SELECT state,
MIN( dt ) AS first_date,
MAX( dt ) AS last_date,
TRUNC( ( MAX( dt ) - MIN( dt ) ) / 365 ) AS duration
FROM groups
GROUP BY state, grp
ORDER BY first_date
结果 :
| STATE | FIRST_DATE | LAST_DATE | DURATION |
|-------|--------------------------------|--------------------------------|----------|
| AA | January, 01 2000 00:00:00+0000 | January, 01 2003 00:00:00+0000 | 3 |
| BB | January, 01 2004 00:00:00+0000 | January, 01 2007 00:00:00+0000 | 3 |
| AA | January, 01 2008 00:00:00+0000 | January, 01 2011 00:00:00+0000 | 3 |
至于它是如何工作的:
groups
子查询选择每一行,然后通过从任何state
的总行数中减去该行state
的行数将其分配给一个组-结果是:
state
和grp
上的所有内容分组,并找到每个组中日期的min
, max
和difference
。 这是完成它的方法之一:
select max(z.state) as state
, min(z.dt) as min_date /* main query */
, max(z.dt) as max_date
, trunc((max(z.dt) - min(z.dt)) / 365) as duaration
from (select q.eid
, q.dt /* query # 2*/
, state
, sum(grp) over(order by q.dt) as grp
from (select eid
, dt
, state /* query # 1*/
, case
when state <> lag(state) over(order by dt)
then 1
end as grp
from t1 ) q
) z
group by z.grp
结果:
STATE MIN_DATE MAX_DATE DUARATION
----- ----------- ----------- ----------
AA 01-JAN-00 01-JAN-03 3
BB 01-JAN-04 01-JAN-07 3
AA 01-JAN-08 01-JAN-11 3
附录1 :查询说明。
为了获得最小和最大日期,我们只需要应用group by
子句,这很明显,但是我们不能,因为BB
之前的AA
状态和BB
状态之后的AA
状态之间存在逻辑差异。 因此,我们必须采取一些措施将它们分开,然后将它们分为不同的逻辑组。 这就是最里面的( /* query # 1*/
)和/* query # 2*/
功能。 查询1查找状态更改时的时刻(将当前行state
与前一个state
进行比较lag() over()
函数用于引用数据集中的前一行),查询2通过计算来形成逻辑组grp
运行总计(为此负责sum() over()
分析函数)。
查询#1给我们:
EID DT STATE GRP
---------- ----------- ----- ----------
1 01-JAN-2000 AA
1 01-JAN-2001 AA
1 01-JAN-2002 AA
1 01-JAN-2003 AA
1 01-JAN-2004 BB 1 --<-- moment when state changes
1 01-JAN-2005 BB
1 01-JAN-2006 BB
1 01-JAN-2007 BB
1 01-JAN-2008 AA 1 --<-- moment when state changes
1 01-JAN-2009 AA
1 01-JAN-2010 AA
1 01-JAN-2011 AA
查询#2构成逻辑组:
EID DT STATE GRP
---------- ----------- ----- ----------
1 01-JAN-2000 AA
1 01-JAN-2001 AA
1 01-JAN-2002 AA
1 01-JAN-2003 AA
1 01-JAN-2004 BB 1
1 01-JAN-2005 BB 1
1 01-JAN-2006 BB 1
1 01-JAN-2007 BB 1
1 01-JAN-2008 AA 2
1 01-JAN-2009 AA 2
1 01-JAN-2010 AA 2
1 01-JAN-2011 AA 2
然后,在主要查询中,我们只是按GRP
分组以产生最终输出。
好的,我更改了查询,但似乎不起作用:
with t2 as
(select t1.*,
case lag(state,1,state) over (order by dt)
when state then 0 else 1 end as state_chng
from t1),
t3 as
(select t2.*,
sum(state_chng) over (order by dt) as group_id
from t2)
select distinct state,
min(dt) over (partition by GROUP_ID) as min_dt,
max(dt) over (partition by GROUP_ID) as max_dt
from t3
order by 2;
| STATE | MIN_DT | MAX_DT |
|-------|--------------------------------|--------------------------------|
| AA | January, 01 2000 00:00:00+0000 | January, 01 2003 00:00:00+0000 |
| BB | January, 01 2004 00:00:00+0000 | January, 01 2008 00:00:00+0000 |
| AA | January, 01 2009 00:00:00+0000 | January, 01 2012 00:00:00+0000 |
| BB | January, 01 2013 00:00:00+0000 | January, 01 2014 00:00:00+0000 |
| AA | January, 01 2015 00:00:00+0000 | January, 01 2018 00:00:00+0000 |
在没有存储过程的情况下,解析函数是实现此目的的唯一方法。
WITH s1 AS (
SELECT eid
, dt
, state
, CASE WHEN LAG(state)
OVER (PARTITION BY eid
ORDER BY dt)
= state
THEN NULL
ELSE dt
END mindt
, CASE WHEN LEAD(state)
OVER (PARTITION BY eid
ORDER BY dt)
= state
THEN NULL
ELSE dt
END maxdt
FROM t1
), s2 as (
select eid
, state
, MAX(mindt)
OVER (PARTITION BY eid
ORDER BY dt)
mindt
, MAX(maxdt)
OVER (PARTITION BY eid
ORDER BY dt)
maxdt
FROM s1
)
SELECT eid
, state
, mindt
, MAX(maxdt) maxdt
FROM s2
GROUP BY eid
, state
, mindt
ORDER BY eid
, mindt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.