简体   繁体   中英

map RDD to PairRDD in Scala

I am trying to map RDD to pairRDD in scala, so I could use reduceByKey later. Here is what I did:

userRecords is of org.apache.spark.rdd.RDD[UserElement]

I try to create a pairRDD from userRecords like below:

val userPairs: PairRDDFunctions[String, UserElement] = userRecords.map { t =>
  val nameKey: String = t.getName()
  (nameKey, t)
}

However, I got the error:

type mismatch; found : org.apache.spark.rdd.RDD[(String, com.mypackage.UserElement)] required: org.apache.spark.rdd.PairRDDFunctions[String,com.mypackage.UserElement]

What am I missing here? Thanks a lot!

You don't need to do that as it is done via implicits (explicitly rddToPairRDDFunctions ). Any RDD that is of type Tuple2[K,V] can automatically be used as a PairRDDFunctions . If you REALLY want to, you can explicitly do what the implicit does and wrap the RDD in a PairRDDFunction :

val pair = new PairRDDFunctions(rdd)

I think you are just missing the import to org.apache.spark.SparkContext._ . This brings all the right implicit conversions in scope to create the PairRDD.

The example below should work (assuming you have initialized a SparkContext under sc):

import org.apache.spark.SparkContext._

val f = sc.parallelize(Array(1,2,3,4,5))
val g: PairRDDFunctions[String, Int] = f.map( x => (x.toString, x))

You can also use keyBy method, you need to provide the key in the function,

in your example, you can simply give userRecords.keyBy(t => t.getName())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM