使用Scala将RDD映射到Spark中的case（Schema）

Question

I am new to scala and spark. 我是scala和spark的新手。 I have a little problem. 我有一点问题。 I have an RDD with following schema. 我有一个具有以下架构的RDD。

    RDD[((String, String), (Int, Timestamp, String, Int))]

and I have to map this RDD to transform it like this 我必须映射这个RDD来像这样转换它

   RDD[(Int, String, String, String, Timestamp, Int)]

and I write following code for this 我为此编写以下代码

  map { case ((pid, name), (id, date, code, level)) => (id, name, code, pid, date, level) }

this work fine. 这项工作很好。 Now I have another RDD 现在我有另一个RDD

    RDD[((String, String), List[(Int, Timestamp, String, Int)])]

and I want to transform it like this as above 我想像上面这样转换它

   RDD[(Int, String, String, String, Timestamp, Int)]

How can I do that I have tried this code but it does not work 我已经尝试过此代码，但无法正常工作，该怎么办

  map {
  case ((pid, name), List(id, date, code, level)) => (id, name, code, pid, date, level)
}

How it can be achieved? 如何实现？

Answer 1

Is this the thing you're looking for? 这是您要找的东西吗？

val input: RDD[((String, String), List[(Int, Timestamp, String, Int)])] = ...
val output: RDD[(Int, String, String, String, Timestamp, Int)] = input.flatMap { case ((pid, name), list) =>
  list.map { case (id, date, code, level) =>
    (id, name, code, pid, date, level)
  }
}

or using for comprehension: 或用于理解：

val output: RDD[(Int, String, String, String, Timestamp, Int)] = for {
  ((pid, name), list)     <- input
  (id, date, code, level) <- list
} yield (id, name, code, pid, date, level)

Answer 2

try 尝试

 map {
  case ((id, name), list) => (id, name, list.flatten)
}

使用Scala将RDD映射到Spark中的case（Schema）

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-08-31 10:07:27

解决方案2
0 2016-08-31 10:11:23

使用Scala将RDD映射到Spark中的case（Schema）

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-08-31 10:07:27

解决方案2 0 2016-08-31 10:11:23

解决方案1
1 已采纳 2016-08-31 10:07:27

解决方案2
0 2016-08-31 10:11:23