I want to convert an array created like:
case class Student(name: String, age: Int)
val dataFrame: DataFrame = sql.createDataFrame(sql.sparkContext.parallelize(List(Student("Torcuato", 27), Student("Rosalinda", 34))))
When I collect the results from the DataFrame, the resulting array is an Array[org.apache.spark.sql.Row] = Array([Torcuato,27], [Rosalinda,34])
I'm looking into converting the DataFrame in an RDD[Map] eg:
Map("name" -> nameOFFirst, "age" -> ageOfFirst)
Map("name" -> nameOFsecond, "age" -> ageOfsecond)
I tried to use map via: x._1
but that does not seem to work for Array [spark.sql.row]
How can I anyway perform the transformation?
You can use map function with pattern matching to do the job here
import org.apache.spark.sql.Row
dataFrame
.map { case Row(name, age) => Map("name" -> name, "age" -> age) }
This will result in RDD[Map[String, Any]]
In other words, you could transform row of dataframe to map, and below works!
def dfToMapOfRdd(df: DataFrame): RDD[Map[String, Any]] = {
val result: RDD[Map[String, Any]] = df.rdd.map(row => {
row.getValuesMap[Any](row.schema.fieldNames)
})
result
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.