scala spark rdd joing two tables with the same id

Question

I have the following rdds:

case class Rating(user_ID: Integer, movie_ID: Integer, rating: Integer, timestamp: String)
case class Movie(movie_ID: Integer, title: String, genre: String)

I join them together in scala, like:

val m = datamovie.keyBy(_.movie_ID)
val r = data.keyBy(_.movie_ID)
val mr = m.join(r)

I get back my result like RDD[(Int, (Movie, Rating))] how can I print the tile of the movies that have the rating 5 for example. I am not quit sure how to work with the new rdd that was created with the join!

Answer 1

Convert them to spark dataframe and perform joins. Is there a specific reason you wanted to keep em RDD's

val m = datamovie.toDF
val r = data.toDF
val mr = m.join(r, Seq("movie_id"), "left").where($"rating" === "5").select($"title")

scala spark rdd joing two tables with the same id

Question

1 answers

solution1
1 ACCPTED 2018-12-16 17:08:10

scala spark rdd joing two tables with the same id

Question

1 answers

solution1 1 ACCPTED 2018-12-16 17:08:10

solution1
1 ACCPTED 2018-12-16 17:08:10