简体   繁体   中英

Hive - Fetching the cumulative sum with previous column value condition

Input Table t1:

在此处输入图片说明

Output required:

在此处输入图片说明

Detailed Description: If the value of fg and x are "Carry" then value of z should be equal to calculated value of z for previous row + 1. Else z should be equal to 0. As per the example, for first row we are considering previous calculated value of z as 0 and then incrementing it by 1 since both fg and x are equal to "Carry".

In second row, both fg and x are "Carry" and calculated value of previous row is 1 than incrementing it by 1 gives 2.

In third row, since fg and x both are not equal to "Carry" so z value is 0.

I have tried using SUM(), LAST_VALUE() functions etc. but nothing seems to work in this case. I am basically trying to replicate retain function of SAS in HIVE. Any help is greatly appreciated.

Note: Ordering is done using id column.

You can define the groups using a cumulative sum. Then use row_number() . In the following code ? is for the column that specifies the ordering:

select t.*,
       (case when fg = 'Carry' and x = 'Carry'
             then row_number() over (partition by id, grp, fg, x order by ?)
             else 0
        end) as z
from (select t.*,
             sum(case when fg = 'Carry' and x = 'Carry' then 0 else 1 end) over (partition by id order by ?) as grp
      from t
     ) t;

Here is a db<>fiddle. Note that this uses Postgres instead of Hive but that should not make a difference.

您应该创建一个变量,并检查fg和x是否都进行,然后增加变量值,否则将其分配为0。

SELECT id, fg, x, if(fg='Carry' and x = 'Carry', @a:=@a+1, @a:=0) as z from t1, (SELECT @a:= 0) as a;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM