[英]Apache Spark - Scala - how to FlatMap (k, {v1,v2,v3,…}) to ((k,v1),(k,v2),(k,v3),…)
I got this: 我懂了:
val vector: RDD[(String, Array[String])] = [("a", {v1,v2,..}),("b", {u1,u2,..})]
wanna convert to: 想转换为:
RDD[(String, String)] = [("a",v1), ("a",v2), ..., ("b",u1), ("b",u2), ...]
Any idea how to do that using flatMap
. 任何想法如何使用
flatMap
做到这一点。
This: 这个:
vector.flatMap { case (x, arr) => arr.map((x, _)) }
Will give you: 会给你:
scala> val vector = sc.parallelize(Vector(("a", Array("b", "c")), ("b", Array("d", "f"))))
vector: org.apache.spark.rdd.RDD[(String, Array[String])] =
ParallelCollectionRDD[3] at parallelize at <console>:27
scala> vector.flatMap { case (x, arr) => arr.map((x, _)) }.collect
res4: Array[(String, String)] = Array((a,b), (a,c), (b,d), (b,f))
You can definitely need to use flatMap
like you mentioned, but in addition, you need to use scala map
as well. 您肯定需要像您提到的那样使用
flatMap
,但是此外,您还需要使用scala map
。
For example: 例如:
val idToVectorValue: RDD[(String, String ] = vector.flatMap((id,values) => values.map(value => (id, value)))
使用单参数功能:
vector.flatMap(data => data._2.map((data._1, _)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.