简体   繁体   English

使用Hive UDF计算权重因子

[英]Weighting Factor calculation with Hive UDF

I'm newbie to Hive, I would an help to write an UDF function for weighting factor calculation. 我是Hive的新手,我将为编写加权因子计算的UDF函数提供帮助。

The calculation seems simple. 计算似乎很简单。

I have one table with some values KEY,VALUE grouped by GROUP_ID . 我有一张表,其中一些值由GROUP_ID分组为KEY,VALUE For each row of one group I want calculate the weighting factor, a float beetween 0 and 1 that's the weight of that element of the group . 对于一组的每一行,我要计算权重因子, 即介于0和1之间的浮点数,即该组元素的权重 The sum of weighting factors into the group must be 1. 该组的加权因子之和必须为1。

In this example the value is the distance, then the weight is inversely proportional to the distance. 在此示例中,值是距离,然后权重与距离成反比。

GROUP_ID | KEY     | VALUE(DISTANCE)
====================================
1          10        4
1          11        3
1          12        2
2          13        1
2          14        5
3          ..        ..
...

Math function: 1/(Xi * sum(1/Xk)) from k=1 to N) 数学函数:从k = 1到N的1 /(Xi * sum(1 / Xk))

GROUP_ID | KEY |   VALUE    | WEIGHTING_FACTOR
=======================================================
1          10      4        1/(4*(1/4+1/3+1/2)) = 0.23
1          11      3        1/(3*(1/4+1/3+1/2)) = 0.31
1          12      2        1/(2*(1/4+1/3+1/2)) = 0.46
2          13      1        1/(1*(1/1+1/5)) = 0.83
2          14      5        1/(5*(1/1+1+5)) = 0.17
3          ..      ..
...

Have you a suggestion for using UDF, UDAF or UDTF function? 您对使用UDF,UDAF或UDTF函数有何建议?

Maybe I must use a "Transform"? 也许我必须使用“转换”? https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM