Spark Scala sum of values by unique key

Question

If I have key,value pairs that compromise item(key) and the sales(value):

bolt 45
bolt 5
drill 1
drill 1
screw 1
screw 2
screw 3

So I want to obtain an RDD where each element is the sum of the values for every unique key:

bolt 50
drill 2
screw 6

My current code is like that:

val salesRDD = sc.textFile("/user/bigdata/sales.txt")
val pairs = salesRDD.map(s => (s, 1))
val counts = pairs.reduceByKey((a, b) => a + b)
counts.collect().foreach(println)

But my results get this:

(bolt 5,1)
(drill 1,2)
(bolt 45,1)
(screw 2,1)
(screw 3,1)
(screw 1,1)

How should I edit my code to get the above result?

Answer 1

Java way, hope you can convert this to scala. Looks like you just need a groupby and count

  salesRDD.groupBy(salesRDD.col("name")).count();


+-----+-----+
| name|count|
+-----+-----+
| bolt|   50|
|drill|    2|
|screw|   6 |
+-----+-----+

Also, please use Datasets and Dataframes rather than RDDs. You will find it a lot handy

Spark Scala sum of values by unique key

Question

1 answers

solution1
0 2021-11-18 11:13:24

Spark Scala sum of values by unique key

Question

1 answers

solution1 0 2021-11-18 11:13:24

solution1
0 2021-11-18 11:13:24