如何欺骗 Scala map 方法为每个输入项生成多个 output？

Question

Quite complex algorith is being applied to list of Spark Dataset's rows (list was obtained using groupByKey and flatMapGroups).相当复杂的算法被应用于 Spark 数据集的行列表（列表是使用 groupByKey 和 flatMapGroups 获得的）。 Most rows are transformed 1: 1 from input to output, but in some scenarios require more than one output per each input.大多数行以 1:1 的比例从输入转换为 output，但在某些情况下，每个输入需要多个 output。 The input row schema can change anytime.输入行架构可以随时更改。 The map() fits the requirements quite well for the 1:1 transformation, but is there a way to use it producing 1: n output? map()非常适合 1:1 转换的要求，但是有没有办法使用它来生成 1: n output？

The only work-around I found relies on foreach method which has unpleasant overhed cause by creating the initial empty list (remember, unlike the simplified example below, real-life list structure is changing randomly).我发现的唯一解决方法依赖于foreach方法，该方法通过创建初始空列表而导致令人不快的重叠（请记住，与下面的简化示例不同，现实生活中的列表结构是随机变化的）。

My original problem is too complex to share here, but this example demonstrates the concept.我原来的问题太复杂了，不能在这里分享，但是这个例子演示了这个概念。 Let's have a list of integers.让我们有一个整数列表。 Each should be transformed into its square value and if the input is even it should also transform into one half of the original value:每个都应该转换成它的平方值，如果输入是偶数，它也应该转换成原始值的一半：

val X = Seq(1, 2, 3, 4, 5)

val y = X.map(x => x * x) //map is intended for 1:1 transformation so it works great here

val z = X.map(x => for(n <- 1 to 5) (n, x * x)) //this attempt FAILS - generates list of five rows with emtpy tuples

// this work-around works, but newX definition is problematic
var newX = List[Int]() //in reality defining as head of the input list and dropping result's tail at the end
val za = X.foreach(x => {
  newX = x*x :: newX
  if(x % 2 == 0) newX = (x / 2) :: newX
})

newX

Is there a better way than foreach construct?有没有比foreach构造更好的方法？

Answer 1

.flatMap produces any number of outputs from a single input. .flatMap从单个输入产生任意数量的输出。

val X = Seq(1, 2, 3, 4, 5)

X.flatMap { x => 
  if (x % 2  == 0) Seq(x*x, x / 2) else Seq(x / 2) 
}
#=> Seq[Int] = List(0, 4, 1, 1, 16, 2, 2)

flatMap in more detail flatMap 更详细

In X.map(f) , f is a function that maps each input to a single output. By contrast, in X.flatMap(g) , the function g maps each input to a sequence of outputs.在X.map(f)中， f是一个 function，它将每个输入映射到单个 output。相比之下，在X.flatMap(g)中，function g将每个输入映射到一系列输出。 flatMap then takes all the sequences produced (one for each element in f ) and concatenates them.然后flatMap获取所有生成的序列（一个对应于f中的每个元素）并将它们连接起来。

The neat thing is .flatMap works not just for sequences, but for all sequence-like objects.巧妙的是.flatMap不仅适用于序列，而且适用于所有类似序列的对象。 For an option, for instance, Option(x)#flatMap(g) will allow g to return an Option .例如，对于一个选项， Option(x)#flatMap(g)将允许g返回一个Option 。 Similarly, Future(x)#flatMap(g) will allow g to return a Future.同样， Future(x)#flatMap(g)将允许g返回 Future。

Whenever the number of elements you return depends on the input, you should think of flatMap .当你返回的元素数量取决于输入时，你应该想到flatMap 。

如何欺骗 Scala map 方法为每个输入项生成多个 output？

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-12-02 21:27:39

flatMap in more detail flatMap 更详细

如何欺骗 Scala map 方法为每个输入项生成多个 output？

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-12-02 21:27:39

flatMap in more detail flatMap 更详细

解决方案1
3 已采纳 2020-12-02 21:27:39