I am trying to reduceByKeys in Scala, is there any method to reduce the values based on the keys in Scala. [ i know we can do by reduceByKey method in spark, but how do we do the same in Scala ? ]
The input Data is :
val File = Source.fromFile("C:/Users/svk12/git/data/retail_db/order_items/part-00000")
.getLines()
.toList
val map = File.map(x => x.split(","))
.map(x => (x(1),x(4)))
map.take(10).foreach(println)
After Above Step i am getting the result as:
(2,250.0)
(2,129.99)
(4,49.98)
(4,299.95)
(4,150.0)
(4,199.92)
(5,299.98)
(5,299.95)
Expected Result :
(2,379.99)
(5,499.93)
.......
It looks like you want the sum of some values from a file. One problem is that files are strings, so you have to cast the String
to a number format before it can be summed.
These are the steps you might use.
io.Source.fromFile("so.txt") //open file
.getLines() //read line-by-line
.map(_.split(",")) //each line is Array[String]
.toSeq //to something that can groupBy()
.groupBy(_(1)) //now is Map[String,Array[String]]
.mapValues(_.map(_(4).toInt).sum) //now is Map[String,Int]
.toSeq //un-Map it to (String,Int) tuples
.sorted //presentation order
.take(10) //sample
.foreach(println) //report
This will, of course, throw if any file data is not in the required format.
Starting Scala 2.13
, you can use the groupMapReduce
method which is (as its name suggests) an equivalent of a groupBy
followed by mapValues
and a reduce
step:
io.Source.fromFile("file.txt")
.getLines.to(LazyList)
.map(_.split(','))
.groupMapReduce(_(1))(_(4).toDouble)(_ + _)
The groupMapReduce
stage:
group
s splited arrays by their 2nd element ( _(1)
) (group part of group MapReduce)
map
s each array occurrence within each group to its 4th element and cast it to Double
( _(4).toDouble
) (map part of group Map Reduce)
reduce
s values within each group ( _ + _
) by summing them (reduce part of groupMap Reduce ).
This is a one-pass version of what can be translated by:
seq.groupBy(_(1)).mapValues(_.map(_(4).toDouble).reduce(_ + _))
Also note the cast from Iterator
to LazyList
in order to use a collection which provides groupMapReduce
(we don't use a Stream
, since starting Scala 2.13
, LazyList
is the recommended replacement of Stream
s).
There is nothing built-in, but you can write it like this:
def reduceByKey[A, B](items: Traversable[(A, B)])(f: (B, B) => B): Map[A, B] = {
var result = Map.empty[A, B]
items.foreach {
case (a, b) =>
result += (a -> result.get(a).map(b1 => f(b1, b)).getOrElse(b))
}
result
}
There is some space to optimize this (eg use mutable maps), but the general idea remains the same.
Another approach, more declarative but less efficient (creates several intermediate collections; can be rewritten but with loss of clarity:
def reduceByKey[A, B](items: Traversable[(A, B)])(f: (B, B) => B): Map[A, B] = {
items
.groupBy { case (a, _) => a }
.mapValues(_.map { case (_, b) => b }.reduce(f))
// mapValues returns a view, view.force changes it back to a realized map
.view.force
}
First group the tuple using key, first element here and then reduce. Following code will work -
val reducedList = map.groupBy(_._1).map(l => (l._1, l._2.map(_._2).reduce(_+_)))
print(reducedList)
Here another solution using a foldLeft:
val File : List[String] = ???
File.map(x => x.split(","))
.map(x => (x(1),x(4).toInt))
.foldLeft(Map.empty[String,Int]){case (state, (key,value)) => state.updated(key,state.get(key).getOrElse(0)+value)}
.toSeq
.sortBy(_._1)
.take(10)
.foreach(println)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.