简体   繁体   中英

scala spark rdd joing two tables with the same id

I have the following rdds:

case class Rating(user_ID: Integer, movie_ID: Integer, rating: Integer, timestamp: String)
case class Movie(movie_ID: Integer, title: String, genre: String)

I join them together in scala, like:

val m = datamovie.keyBy(_.movie_ID)
val r = data.keyBy(_.movie_ID)
val mr = m.join(r)  

I get back my result like RDD[(Int, (Movie, Rating))] how can I print the tile of the movies that have the rating 5 for example. I am not quit sure how to work with the new rdd that was created with the join!

Convert them to spark dataframe and perform joins. Is there a specific reason you wanted to keep em RDD's

val m = datamovie.toDF
val r = data.toDF
val mr = m.join(r, Seq("movie_id"), "left").where($"rating" === "5").select($"title")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM