[英]How to convert an RDD of Maps to dataframe
我有地圖的RDD,我想將其轉換為數據幀這是RDD的輸入格式
val mapRDD: RDD[Map[String, String]] = sc.parallelize(Seq(
Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))
有沒有辦法轉換成數據幀,如
val df=mapRDD.toDf
df.show
empid, empName, depId
12 Rohan 201
13 Ross 201
14 Richard 401
15 Michale 501
16 John 701
您可以輕松將其轉換為Spark DataFrame:
這是一個可以解決問題的代碼:
val mapRDD= sc.parallelize(Seq(
Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))
val columns=mapRDD.take(1).flatMap(a=>a.keys)
val resultantDF=mapRDD.map{value=>
val list=value.values.toList
(list(0),list(1),list(2))
}.toDF(columns:_*)
resultantDF.show()
輸出是:
+-----+-------+-----+
|empid|empName|depId|
+-----+-------+-----+
| 12| Rohan| 201|
| 13| Ross| 201|
| 14|Richard| 401|
| 15|Michale| 501|
| 16| John| 701|
+-----+-------+-----+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.