![](/img/trans.png)
[英]Java : How to do aggregation over a list supporting min, max, avg, last kind of aggregations in each group
[英]Java/Spark - Group by weighted avg aggregation
资料:
id | sector | balance
---------------------------
1 | restaurant | 20000
2 | restaurant | 20000
3 | auto | 10000
4 | auto | 10000
5 | auto | 10000
我希望将其作为df加载到spark中并按余额和计算分组,但我还必须针对总余额(所有ID的sum(balance))计算balace%
我怎样才能做到这一点?
要获得相对于总数的百分比,您可以使用DoubleRDDFunctions:
val totalBalance = data.map(_._3.toDouble).sum()
val percentageRow = data.map(d => d._3 * 100 / totalBalance)
val percentageGroup = data.map(d => (d._2, d._3))
.reduceByKey((x,y) => x+y).mapValues(sumGroup => sumGroup * 100 / totalBalance)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.