简体   繁体   中英

How to write a query to attach rownumber(1 to n) to each records for each group

I have a dataset something like below

|date|flag|
|20190503|0|
|20190504|1|
|20190505|1|
|20190506|1|
|20190507|1|
|20190508|0|
|20190509|0|
|20190510|0|
|20190511|1|
|20190512|1|
|20190513|0|
|20190514|0|
|20190515|1|

What I want to achieve is to group the consecutive dates by flag=1, and add one column counter to mark 1 for the first day of the consecutive days where flag=1, and 2 for the 2nd day and etc, assign 0 for flag=0

|date|flag|counter|
|20190503|0|0|
|20190504|1|1|
|20190505|1|2|
|20190506|1|3|
|20190507|1|4|
|20190508|0|0|
|20190509|0|0|
|20190510|0|0|
|20190511|1|1|
|20190512|1|2|
|20190513|0|0|
|20190514|0|0|
|20190515|1|1|

I tried analytical function and hierarchy query, but still haven't found a solution, seeking help, any hint is appreciated!

Thanks, Hong

You can define the groups using a cumulative sum of the zeros. Then use row_number() :

select t.*,
       (case when flag = 0 then 0
             else row_number() over (partition by grp order by date)
        end) as counter
from (select t.*,
             sum(case when flag = 0 then 1 else 0 end) over (order by date) as grp
      from t
     ) t;

A very different approach is to take the difference between the current date and a cumulative max of the flag = 0 date:

select t.*,
       datediff(day,
                max(case when flag = 0 then date end) over (order by date),
                date
               ) as counter
from t;

Note that the logic of these two approaches is different -- although they should produce the same results for the data you have provided. For missing dates, the first just ignores missing dates. The second will increment the counter for missing dates.

Well - Vertica has a very nice CONDITIONAL_CHANGE_EVENT() function that could help you there ...

Everytime the expression between the brackets changes, an integer is incremented by 1. This gives you a new group identifier, or a criterion to PARTITION BY, every time the flag changes. So one SELECT to get the grouping info, and then partition by the obtained grouping info. Here goes:

WITH
input(dt,flag) AS (
          SELECT '2019-05-03'::DATE,0
UNION ALL SELECT '2019-05-04'::DATE,1
UNION ALL SELECT '2019-05-05'::DATE,1
UNION ALL SELECT '2019-05-06'::DATE,1
UNION ALL SELECT '2019-05-07'::DATE,1
UNION ALL SELECT '2019-05-08'::DATE,0
UNION ALL SELECT '2019-05-09'::DATE,0
UNION ALL SELECT '2019-05-10'::DATE,0
UNION ALL SELECT '2019-05-11'::DATE,1
UNION ALL SELECT '2019-05-12'::DATE,1
UNION ALL SELECT '2019-05-13'::DATE,0
UNION ALL SELECT '2019-05-14'::DATE,0
UNION ALL SELECT '2019-05-15'::DATE,1
)
,
grp_input AS (
SELECT
*
, CONDITIONAL_CHANGE_EVENT(flag) OVER(ORDER BY dt) AS grp
FROM input
)
SELECT
dt
, flag
, CASE FLAG
WHEN 0 THEN 0
ELSE ROW_NUMBER() OVER(PARTITION BY grp ORDER BY dt)
END AS counter
FROM grp_input;
-- out      dt     | flag | counter 
-- out ------------+------+---------
-- out  2019-05-03 |    0 |       0
-- out  2019-05-04 |    1 |       1
-- out  2019-05-05 |    1 |       2
-- out  2019-05-06 |    1 |       3
-- out  2019-05-07 |    1 |       4
-- out  2019-05-08 |    0 |       0
-- out  2019-05-09 |    0 |       0
-- out  2019-05-10 |    0 |       0
-- out  2019-05-11 |    1 |       1
-- out  2019-05-12 |    1 |       2
-- out  2019-05-13 |    0 |       0
-- out  2019-05-14 |    0 |       0
-- out  2019-05-15 |    1 |       1
-- out (13 rows)
-- out 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM