简体   繁体   English

将RDD映射到Scala中的PairRDD

[英]map RDD to PairRDD in Scala

I am trying to map RDD to pairRDD in scala, so I could use reduceByKey later. 我试图将RDD映射到scala中的pairRDD,所以我可以稍后使用reduceByKey。 Here is what I did: 这是我做的:

userRecords is of org.apache.spark.rdd.RDD[UserElement] userRecords是org.apache.spark.rdd.RDD [UserElement]

I try to create a pairRDD from userRecords like below: 我尝试从userRecords创建一个pairRDD,如下所示:

val userPairs: PairRDDFunctions[String, UserElement] = userRecords.map { t =>
  val nameKey: String = t.getName()
  (nameKey, t)
}

However, I got the error: 但是,我得到了错误:

type mismatch; 类型不匹配; found : org.apache.spark.rdd.RDD[(String, com.mypackage.UserElement)] required: org.apache.spark.rdd.PairRDDFunctions[String,com.mypackage.UserElement] 发现:org.apache.spark.rdd.RDD [(String,com.mypackage.UserElement)]必需:org.apache.spark.rdd.PairRDDFunctions [String,com.mypackage.UserElement]

What am I missing here? 我在这里错过了什么? Thanks a lot! 非常感谢!

You don't need to do that as it is done via implicits (explicitly rddToPairRDDFunctions ). 您不需要这样做,因为它是通过implicits (显式为rddToPairRDDFunctions )完成的。 Any RDD that is of type Tuple2[K,V] can automatically be used as a PairRDDFunctions . 任何类型为Tuple2[K,V] RDD都可以自动用作PairRDDFunctions If you REALLY want to, you can explicitly do what the implicit does and wrap the RDD in a PairRDDFunction : 如果您真的想要,您可以显式执行implicit操作并将RDD包装在PairRDDFunction

val pair = new PairRDDFunctions(rdd)

I think you are just missing the import to org.apache.spark.SparkContext._ . 我想你只是缺少对org.apache.spark.SparkContext._的导入。 This brings all the right implicit conversions in scope to create the PairRDD. 这会在范围内带来所有正确的隐式转换,以创建PairRDD。

The example below should work (assuming you have initialized a SparkContext under sc): 下面的示例应该有效(假设您已在sc下初始化了SparkContext):

import org.apache.spark.SparkContext._

val f = sc.parallelize(Array(1,2,3,4,5))
val g: PairRDDFunctions[String, Int] = f.map( x => (x.toString, x))

You can also use keyBy method, you need to provide the key in the function, 你也可以使用keyBy方法,你需要在函数中提供密钥,

in your example, you can simply give userRecords.keyBy(t => t.getName()) 在你的例子中,你可以简单地给userRecords.keyBy(t => t.getName())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM