繁体   English   中英

Hive:如何根据前一行的值找出值?

[英]Hive: How to find out a value base on previous row's value?

我有一个物联网风格的数据。 我必须用来自该“无”的最近时间的值替换“无”(该最近时间的值不是“无”)。

原始数据:

+---------------------+--------+
|     time            | value  |
|---------------------+--------+
| 2020-01-01 11:11:10 | "0.3"  |
| 2020-01-01 11:11:11 | "0.2"  |
| 2020-01-01 11:11:12 | "none" |
| 2020-01-01 11:11:13 | "none" |
| 2020-01-01 11:11:14 | "none" |
| 2020-01-01 11:11:15 | "0.1"  |
| 2020-01-01 11:11:16 | "none" |
| 2020-01-01 11:11:17 | "0.4"  |
+---------------------+--------+


最终数据是这样的

+---------------------+--------+
|     time            | value  |
|---------------------+--------+
| 2020-01-01 11:11:10 | "0.3"  |
| 2020-01-01 11:11:11 | "0.2"  |
| 2020-01-01 11:11:12 | "0.2"  |
| 2020-01-01 11:11:13 | "0.2"  |
| 2020-01-01 11:11:14 | "0.2"  |
| 2020-01-01 11:11:15 | "0.1"  |
| 2020-01-01 11:11:16 | "0.1"  |
| 2020-01-01 11:11:17 | "0.4"  |
+---------------------+--------+


让我假设“无价值”实际上是NULL 然后你想要LAG(IGNORE NULLS) ,但 Hive 不支持。 但是你可以通过两个步骤来做到这一点。 通过计算每行“真实”值的数量来识别组然后使用窗口函数分配值:

select t.*, max(value) over (partition by grp)
from (select t.*,
             count(value) over (order by time) as grp
      from t
     ) t

编辑:

如果您实际上将值存储为字符串,并且'none'是真实值,则只需使用上述变体:

select t.*,
       max(nullif(value, 'none')) over (partition by grp)
from (select t.*,
             count(nullif(value, 'none')) over (order by time) as grp
      from t
     ) t

您的问题类似于在 HIVE 中使用 COALESCE 将 Null 值替换为相同的列值

有一个细微的区别:

with rank_table as ( 
select *, SUM(value) OVER (ORDER BY time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as rnk
  from your_table
)
select *, max(value) over (partition by rnk)
  from rank_table  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM