[英]Getting min,max,mean,std from array field in hive by sql
我有一个蜂巢表,如:
CREATE EXTERNAL TABLE spare_table(
id int,
value array<float>,
value2 array<float>
)
stored as orc tblproperties ("orc.compress"="SNAPPY");
和数据:
+---+----------+----------+
| id| value| value2|
+---+----------+----------+
| 1|[1.0, 2.0]| [9.0]|
| 2|[1.0, 2.0]| [9.0]|
| 3| [9.0]|[1.0, 2.0]|
| 4|[1.0, 2.0]|[1.0, 2.0]|
+---+----------+----------+
我想通过SQL获取蜂巢中的value
数组字段的最小值,最大值,平均值,std 。 我希望得到如下结果:
+---+----------+----------+---+---+---+---+----+----+----+----+
| id| value| value2|min|max|avg|std|min2|max2|avg2|std2|
+---+----------+----------+---+---+---+---+----+----+----+----+
| 1|[1.0, 2.0]| [9.0]|1.0|2.0|1.5|0.5| 9.0| 9.0| 9.0| 0.0|
| 2|[1.0, 2.0]| [9.0]|1.0|2.0|1.5|0.5| 9.0| 9.0| 9.0| 0.0|
| 3| [9.0]|[1.0, 2.0]|9.0|9.0|9.0|0.0| 1.0| 2.0| 1.5| 0.5|
| 4|[1.0, 2.0]|[1.0, 2.0]|1.0|2.0|1.5|0.5| 1.0| 2.0| 1.5| 0.5|
+---+----------+----------+---+---+---+---+----+----+----+----+
我试过了
select id,min(value),max(value),AVG(value),stddev(value),min(value2),max(value2),AVG(value2),stddev(value2) from feature_info
显示错误:
失败:UDFArgumentTypeException仅接受原始类型参数,但会传递数组。
我不知道如何从数组字段中获取它们。 有人能帮我吗?
更新由于某种原因,我无法进行lateral views
。 有什么方法可以直接在数组字段上使用它吗?
您可以在下面尝试吗,它应该有所帮助
hive> desc array_test;
OK
c1 int
c2 array<decimal(1,0)>
Time taken: 0.09 seconds, Fetched: 2 row(s)
hive> select * from array_test;
OK
1 [1,2,3]
2 [2,3,4]
Time taken: 0.14 seconds, Fetched: 2 row(s)
hive> select c1, c22 from (select c1, c2 from array_test) a lateral view explode(a.c2) exploded as c22;
OK
1 1
1 2
1 3
2 2
2 3
2 4
hive> with res1 as (select c1, c22 from (select c1, c2 from array_test) a lateral view explode(a.c2) exploded as c22) select c1, min(c22), max(c22), avg(c22) from res1 group by c1;
OK
1 1 3 2.0000
2 2 4 3.0000
您可以浏览-https://community.hortonworks.com/questions/222388/hive-split-for-columns.html
select
*
,val_s[0] as v1_min
,val_s[size(val_s)-1] as v1_max
,val2_s[0] as v2_min
,val2_s[size(val2_s)-1] as v2_max
,(val_s[0] + val_s[size(val_s)-1])/2 as avg_v1
,(val2_s[0] + val2_s[size(val2_s)-1] ) /2 as avg_v2
from
(select *
,sort_array(value) as val_s
,sort_array(value2) as val2_s
from table
) a
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.