[英]Mapping RDD to case(Schema) in Spark with Scala
I am new to scala and spark. 我是scala和spark的新手。 I have a little problem.
我有一点问题。 I have an RDD with following schema.
我有一个具有以下架构的RDD。
RDD[((String, String), (Int, Timestamp, String, Int))]
and I have to map this RDD to transform it like this 我必须映射这个RDD来像这样转换它
RDD[(Int, String, String, String, Timestamp, Int)]
and I write following code for this 我为此编写以下代码
map { case ((pid, name), (id, date, code, level)) => (id, name, code, pid, date, level) }
this work fine. 这项工作很好。 Now I have another RDD
现在我有另一个RDD
RDD[((String, String), List[(Int, Timestamp, String, Int)])]
and I want to transform it like this as above 我想像上面这样转换它
RDD[(Int, String, String, String, Timestamp, Int)]
How can I do that I have tried this code but it does not work 我已经尝试过此代码,但无法正常工作,该怎么办
map {
case ((pid, name), List(id, date, code, level)) => (id, name, code, pid, date, level)
}
How it can be achieved? 如何实现?
Is this the thing you're looking for? 这是您要找的东西吗?
val input: RDD[((String, String), List[(Int, Timestamp, String, Int)])] = ...
val output: RDD[(Int, String, String, String, Timestamp, Int)] = input.flatMap { case ((pid, name), list) =>
list.map { case (id, date, code, level) =>
(id, name, code, pid, date, level)
}
}
or using for comprehension: 或用于理解:
val output: RDD[(Int, String, String, String, Timestamp, Int)] = for {
((pid, name), list) <- input
(id, date, code, level) <- list
} yield (id, name, code, pid, date, level)
try 尝试
map {
case ((id, name), list) => (id, name, list.flatten)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.