简体   繁体   中英

Convert RDD[Array[(String,String)]] type to RDD[(String,String)] in scala

I'm new to Scala and tried multiple things to convert RDD[Array[(String,String)]] type to RDD[(String,String)] .

What I want to achive is to select from a Json two elements (text and category). For every word in the text, I just want to create a key/value pair in the form (word1, category), (word2, category), ....

My example looks like this:

import org.json4s._
import org.json4s.jackson.JsonMethods._
// Example Json-line: {"reviewText": "This was a gift!", "category": "Apps"}"
val rdd = sc.textFile(PathToJSONFile)
rdd.map{    
   row =>
   val json_row = parse(row)
   val myCategory = compact(json_row \ "category").toString
   val myText = compact(json_row \ "reviewText").toString.toLowerCase.split("[#&$!]").map(_.trim).filter(_.length > 1)
   myText.map{word => (word, myCategory)}
}

The output is org.apache.spark.rdd.RDD[Array[(String, String)]] and looks like this:

Array(Array((this,"Apps"), (was,"Apps"), (a,"Apps"), (gift,"Apps"))

But what I want to achieve is a key value pair in the form of RDD[(String,String)] (where key is a word and the value is the same category for every word in this line)

How can I achieve this? Many thanks!

The suggestions from Psidom solved the problem. Changing rdd.map to rdd.flatMap was the solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM