简体   繁体   中英

How to trick Scala map method to produce more than one output per each input item?

Quite complex algorith is being applied to list of Spark Dataset's rows (list was obtained using groupByKey and flatMapGroups). Most rows are transformed 1: 1 from input to output, but in some scenarios require more than one output per each input. The input row schema can change anytime. The map() fits the requirements quite well for the 1:1 transformation, but is there a way to use it producing 1: n output?

The only work-around I found relies on foreach method which has unpleasant overhed cause by creating the initial empty list (remember, unlike the simplified example below, real-life list structure is changing randomly).

My original problem is too complex to share here, but this example demonstrates the concept. Let's have a list of integers. Each should be transformed into its square value and if the input is even it should also transform into one half of the original value:

val X = Seq(1, 2, 3, 4, 5)

val y = X.map(x => x * x) //map is intended for 1:1 transformation so it works great here

val z = X.map(x => for(n <- 1 to 5) (n, x * x)) //this attempt FAILS - generates list of five rows with emtpy tuples

// this work-around works, but newX definition is problematic
var newX = List[Int]() //in reality defining as head of the input list and dropping result's tail at the end
val za = X.foreach(x => {
  newX = x*x :: newX
  if(x % 2 == 0) newX = (x / 2) :: newX
})

newX

Is there a better way than foreach construct?

.flatMap produces any number of outputs from a single input.

val X = Seq(1, 2, 3, 4, 5)

X.flatMap { x => 
  if (x % 2  == 0) Seq(x*x, x / 2) else Seq(x / 2) 
}
#=> Seq[Int] = List(0, 4, 1, 1, 16, 2, 2)

flatMap in more detail

In X.map(f) , f is a function that maps each input to a single output. By contrast, in X.flatMap(g) , the function g maps each input to a sequence of outputs. flatMap then takes all the sequences produced (one for each element in f ) and concatenates them.

The neat thing is .flatMap works not just for sequences, but for all sequence-like objects. For an option, for instance, Option(x)#flatMap(g) will allow g to return an Option . Similarly, Future(x)#flatMap(g) will allow g to return a Future.

Whenever the number of elements you return depends on the input, you should think of flatMap .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM