[英]How to write a query to attach rownumber(1 to n) to each records for each group
I have a dataset something like below 我有一个像下面的数据集
|date|flag|
|20190503|0|
|20190504|1|
|20190505|1|
|20190506|1|
|20190507|1|
|20190508|0|
|20190509|0|
|20190510|0|
|20190511|1|
|20190512|1|
|20190513|0|
|20190514|0|
|20190515|1|
What I want to achieve is to group the consecutive dates by flag=1, and add one column counter to mark 1 for the first day of the consecutive days where flag=1, and 2 for the 2nd day and etc, assign 0 for flag=0 我要实现的是将连续的日期按flag = 1进行分组,并在flag = 1的连续几天的第一天为标记1添加一个列计数器,对于第二天的第2天添加2列,等等,为flag分配0 = 0
|date|flag|counter|
|20190503|0|0|
|20190504|1|1|
|20190505|1|2|
|20190506|1|3|
|20190507|1|4|
|20190508|0|0|
|20190509|0|0|
|20190510|0|0|
|20190511|1|1|
|20190512|1|2|
|20190513|0|0|
|20190514|0|0|
|20190515|1|1|
I tried analytical function and hierarchy query, but still haven't found a solution, seeking help, any hint is appreciated! 我尝试了解析函数和层次结构查询,但是仍然没有找到解决方案,寻求帮助,任何提示都值得赞赏!
Thanks, Hong 谢谢你洪
You can define the groups using a cumulative sum of the zeros. 您可以使用零的累加和定义组。 Then use
row_number()
: 然后使用
row_number()
:
select t.*,
(case when flag = 0 then 0
else row_number() over (partition by grp order by date)
end) as counter
from (select t.*,
sum(case when flag = 0 then 1 else 0 end) over (order by date) as grp
from t
) t;
A very different approach is to take the difference between the current date and a cumulative max of the flag = 0
date: 一种非常不同的方法是采用当前日期与
flag = 0
date的累积最大值之间的差值:
select t.*,
datediff(day,
max(case when flag = 0 then date end) over (order by date),
date
) as counter
from t;
Note that the logic of these two approaches is different -- although they should produce the same results for the data you have provided. 请注意,这两种方法的逻辑是不同的-尽管它们对于所提供的数据应该产生相同的结果。 For missing dates, the first just ignores missing dates.
对于丢失的日期,第一个只是忽略丢失的日期。 The second will increment the counter for missing dates.
第二秒钟将增加缺少日期的计数器。
Well - Vertica has a very nice CONDITIONAL_CHANGE_EVENT() function that could help you there ... 很好-Vertica有一个非常不错的CONDITIONAL_CHANGE_EVENT()函数,可以帮助您...
Everytime the expression between the brackets changes, an integer is incremented by 1. This gives you a new group identifier, or a criterion to PARTITION BY, every time the flag
changes. 括号之间的表达式每次更改时,整数都会增加1。每次
flag
更改时,都会为您提供新的组标识符或PARTITION BY的条件。 So one SELECT to get the grouping info, and then partition by the obtained grouping info. 因此,一个SELECT即可获取分组信息,然后按所获得的分组信息进行分区。 Here goes:
开始:
WITH
input(dt,flag) AS (
SELECT '2019-05-03'::DATE,0
UNION ALL SELECT '2019-05-04'::DATE,1
UNION ALL SELECT '2019-05-05'::DATE,1
UNION ALL SELECT '2019-05-06'::DATE,1
UNION ALL SELECT '2019-05-07'::DATE,1
UNION ALL SELECT '2019-05-08'::DATE,0
UNION ALL SELECT '2019-05-09'::DATE,0
UNION ALL SELECT '2019-05-10'::DATE,0
UNION ALL SELECT '2019-05-11'::DATE,1
UNION ALL SELECT '2019-05-12'::DATE,1
UNION ALL SELECT '2019-05-13'::DATE,0
UNION ALL SELECT '2019-05-14'::DATE,0
UNION ALL SELECT '2019-05-15'::DATE,1
)
,
grp_input AS (
SELECT
*
, CONDITIONAL_CHANGE_EVENT(flag) OVER(ORDER BY dt) AS grp
FROM input
)
SELECT
dt
, flag
, CASE FLAG
WHEN 0 THEN 0
ELSE ROW_NUMBER() OVER(PARTITION BY grp ORDER BY dt)
END AS counter
FROM grp_input;
-- out dt | flag | counter
-- out ------------+------+---------
-- out 2019-05-03 | 0 | 0
-- out 2019-05-04 | 1 | 1
-- out 2019-05-05 | 1 | 2
-- out 2019-05-06 | 1 | 3
-- out 2019-05-07 | 1 | 4
-- out 2019-05-08 | 0 | 0
-- out 2019-05-09 | 0 | 0
-- out 2019-05-10 | 0 | 0
-- out 2019-05-11 | 1 | 1
-- out 2019-05-12 | 1 | 2
-- out 2019-05-13 | 0 | 0
-- out 2019-05-14 | 0 | 0
-- out 2019-05-15 | 1 | 1
-- out (13 rows)
-- out
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.