繁体   English   中英

通过SQL从Hive中的数组字段获取最小,最大,平均值,标准

[英]Getting min,max,mean,std from array field in hive by sql

我有一个蜂巢表,如:

CREATE EXTERNAL TABLE spare_table(
  id int,
  value array<float>,
  value2 array<float>
)
stored as orc tblproperties ("orc.compress"="SNAPPY");

和数据:

+---+----------+----------+
| id|     value|    value2|
+---+----------+----------+
|  1|[1.0, 2.0]|     [9.0]|
|  2|[1.0, 2.0]|     [9.0]|
|  3|     [9.0]|[1.0, 2.0]|
|  4|[1.0, 2.0]|[1.0, 2.0]|
+---+----------+----------+

我想通过SQL获取蜂巢中的value数组字段的最小值,最大值,平均值,std 我希望得到如下结果:

+---+----------+----------+---+---+---+---+----+----+----+----+
| id|     value|    value2|min|max|avg|std|min2|max2|avg2|std2|
+---+----------+----------+---+---+---+---+----+----+----+----+
|  1|[1.0, 2.0]|     [9.0]|1.0|2.0|1.5|0.5| 9.0| 9.0| 9.0| 0.0|
|  2|[1.0, 2.0]|     [9.0]|1.0|2.0|1.5|0.5| 9.0| 9.0| 9.0| 0.0|
|  3|     [9.0]|[1.0, 2.0]|9.0|9.0|9.0|0.0| 1.0| 2.0| 1.5| 0.5|
|  4|[1.0, 2.0]|[1.0, 2.0]|1.0|2.0|1.5|0.5| 1.0| 2.0| 1.5| 0.5|
+---+----------+----------+---+---+---+---+----+----+----+----+

我试过了

select id,min(value),max(value),AVG(value),stddev(value),min(value2),max(value2),AVG(value2),stddev(value2) from feature_info

显示错误:

失败:UDFArgumentTypeException仅接受原始类型参数,但会传递数组。

我不知道如何从数组字段中获取它们。 有人能帮我吗?

更新由于某种原因,我无法进行lateral views 有什么方法可以直接在数组字段上使用它吗?

您可以在下面尝试吗,它应该有所帮助

hive> desc array_test;
OK
c1                      int
c2                      array<decimal(1,0)>
Time taken: 0.09 seconds, Fetched: 2 row(s)
hive> select * from array_test;
OK
1       [1,2,3]
2       [2,3,4]
Time taken: 0.14 seconds, Fetched: 2 row(s)

hive> select c1, c22 from (select c1, c2 from array_test) a lateral view explode(a.c2) exploded as c22;
OK
1       1
1       2
1       3
2       2
2       3
2       4

hive> with res1 as (select c1, c22 from (select c1, c2 from array_test) a lateral view explode(a.c2) exploded as c22) select c1, min(c22), max(c22), avg(c22) from res1 group by c1;
OK
1       1       3       2.0000
2       2       4       3.0000

您可以浏览-https://community.hortonworks.com/questions/222388/hive-split-for-columns.html

select
*
,val_s[0] as v1_min
,val_s[size(val_s)-1] as v1_max
,val2_s[0] as v2_min
,val2_s[size(val2_s)-1] as v2_max
,(val_s[0] + val_s[size(val_s)-1])/2 as avg_v1
,(val2_s[0] + val2_s[size(val2_s)-1] ) /2 as avg_v2
from
(select *
        ,sort_array(value) as val_s
        ,sort_array(value2) as val2_s
        from table
) a

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM