简体   繁体   中英

Java/Spark - Group by weighted avg aggregation

data :

id | sector     | balance
---------------------------
1  | restaurant | 20000
2  | restaurant | 20000
3  | auto       | 10000
4  | auto       | 10000
5  | auto       | 10000

i am looking to load this into spark as a df and calculate group by balance sums, but i also have to calculate the balace% against total balance (sum(balance) for all ids)

how can I accomplish this ?

To get the % against total you could use the DoubleRDDFunctions:

val totalBalance = data.map(_._3.toDouble).sum()

val percentageRow = data.map(d => d._3 * 100 / totalBalance)

val percentageGroup = data.map(d => (d._2, d._3))
         .reduceByKey((x,y) => x+y).mapValues(sumGroup => sumGroup * 100 / totalBalance)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM