简体   繁体   English

如何欺骗 Scala map 方法为每个输入项生成多个 output?

[英]How to trick Scala map method to produce more than one output per each input item?

Quite complex algorith is being applied to list of Spark Dataset's rows (list was obtained using groupByKey and flatMapGroups).相当复杂的算法被应用于 Spark 数据集的行列表(列表是使用 groupByKey 和 flatMapGroups 获得的)。 Most rows are transformed 1: 1 from input to output, but in some scenarios require more than one output per each input.大多数行以 1:1 的比例从输入转换为 output,但在某些情况下,每个输入需要多个 output。 The input row schema can change anytime.输入行架构可以随时更改。 The map() fits the requirements quite well for the 1:1 transformation, but is there a way to use it producing 1: n output? map()非常适合 1:1 转换的要求,但是有没有办法使用它来生成 1: n output?

The only work-around I found relies on foreach method which has unpleasant overhed cause by creating the initial empty list (remember, unlike the simplified example below, real-life list structure is changing randomly).我发现的唯一解决方法依赖于foreach方法,该方法通过创建初始空列表而导致令人不快的重叠(请记住,与下面的简化示例不同,现实生活中的列表结构是随机变化的)。

My original problem is too complex to share here, but this example demonstrates the concept.我原来的问题太复杂了,不能在这里分享,但是这个例子演示了这个概念。 Let's have a list of integers.让我们有一个整数列表。 Each should be transformed into its square value and if the input is even it should also transform into one half of the original value:每个都应该转换成它的平方值,如果输入是偶数,它也应该转换成原始值的一半:

val X = Seq(1, 2, 3, 4, 5)

val y = X.map(x => x * x) //map is intended for 1:1 transformation so it works great here

val z = X.map(x => for(n <- 1 to 5) (n, x * x)) //this attempt FAILS - generates list of five rows with emtpy tuples

// this work-around works, but newX definition is problematic
var newX = List[Int]() //in reality defining as head of the input list and dropping result's tail at the end
val za = X.foreach(x => {
  newX = x*x :: newX
  if(x % 2 == 0) newX = (x / 2) :: newX
})

newX

Is there a better way than foreach construct?有没有比foreach构造更好的方法?

.flatMap produces any number of outputs from a single input. .flatMap从单个输入产生任意数量的输出。

val X = Seq(1, 2, 3, 4, 5)

X.flatMap { x => 
  if (x % 2  == 0) Seq(x*x, x / 2) else Seq(x / 2) 
}
#=> Seq[Int] = List(0, 4, 1, 1, 16, 2, 2)

flatMap in more detail flatMap 更详细

In X.map(f) , f is a function that maps each input to a single output. By contrast, in X.flatMap(g) , the function g maps each input to a sequence of outputs.X.map(f)中, f是一个 function,它将每个输入映射到单个 output。相比之下,在X.flatMap(g)中,function g将每个输入映射到一系列输出。 flatMap then takes all the sequences produced (one for each element in f ) and concatenates them.然后flatMap获取所有生成的序列(一个对应于f中的每个元素)并将它们连接起来。

The neat thing is .flatMap works not just for sequences, but for all sequence-like objects.巧妙的是.flatMap不仅适用于序列,而且适用于所有类似序列的对象。 For an option, for instance, Option(x)#flatMap(g) will allow g to return an Option .例如,对于一个选项, Option(x)#flatMap(g)将允许g返回一个Option Similarly, Future(x)#flatMap(g) will allow g to return a Future.同样, Future(x)#flatMap(g)将允许g返回 Future。

Whenever the number of elements you return depends on the input, you should think of flatMap .当你返回的元素数量取决于输入时,你应该想到flatMap

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM