[英]Get the last value of partition group in Hive query, but with additional requirements
Say I've got 3 columns in a table: id, flag, time.假设我在一个表中有 3 列:id、flag、time。 Flag can only be one of the three: A1, A2, B.
Flag 只能是以下三个之一:A1、A2、B。
ID flag time
1 A1 2016-01-01
1 A2 2016-01-02
1 B 2016-01-03
1 B 2016-01-04
2 A1 2016-01-02
2 B 2016-01-03
2 A2 2016-01-04
2 B 2016-01-05
The data has been sorted by time for each ID.数据已按每个 ID 的时间排序。 Now I'd like to get, for each ID, when the flag equals B, the last non-B flag, eg:
现在我想得到,对于每个 ID,当标志等于 B 时,最后一个非 B 标志,例如:
1 B 2016-01-03 A2 2016-01-02
1 B 2016-01-04 A2 2016-01-02
2 B 2016-01-03 A1 2016-01-02
2 B 2016-01-05 A2 2016-01-04
Is this even possible in a Hive query?这在 Hive 查询中甚至可能吗?
Use max
window function to get the running maximum time for non B flags.使用
max
窗口函数获取非 B 标志的运行最长时间。 Then join
this result to the original table to get the flag information for the corresponding max time (before flag B for a given id).然后
join
这个结果与原始表以获取相应的最大时间标志信息(标志B之前,对于给定的ID)。
SELECT X.*,
T.FLAG
FROM
(SELECT T.*,
MAX(CASE WHEN FLAG<>'B' THEN TIME END) OVER(PARTITION BY ID ORDER BY TIME) AS MAX_TIME_BEFORE_B
FROM T
) X
JOIN T ON T.ID=X.ID AND T.TIME=X.MAX_TIME_BEFORE_B
WHERE X.FLAG='B'
select id
,flag
,time
,A.flag as A_flag
,A.time as A_time
from (select id
,flag
,time
,max
(
case
when flag <> 'B'
then named_struct ('time',time,'flag',flag)
end
) over
(
partition by id
order by time
rows unbounded preceding
) as A
from t
) t
where flag = 'B'
;
+----+------+------------+--------+------------+
| id | flag | time | a_flag | a_time |
+----+------+------------+--------+------------+
| 1 | B | 2016-01-03 | A2 | 2016-01-02 |
| 1 | B | 2016-01-04 | A2 | 2016-01-02 |
| 2 | B | 2016-01-03 | A1 | 2016-01-02 |
| 2 | B | 2016-01-05 | A2 | 2016-01-04 |
+----+------+------------+--------+------------+
Ps ps
time
) as column name.time
)作为列名。time
for date column.time
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.