简体   繁体   中英

How to convert an RDD of Maps to dataframe

I have RDD of Map and i want to converted it to dataframe Here is the input format of RDD

val mapRDD: RDD[Map[String, String]] = sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))

is there any way to convert into dataframe like

 val df=mapRDD.toDf

df.show

empid,  empName,    depId
12      Rohan       201
13      Ross        201
14      Richard     401
15      Michale     501
16      John        701

You can easily convert it into Spark DataFrame:

Here is a code that would do the trick :

val mapRDD= sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))

val columns=mapRDD.take(1).flatMap(a=>a.keys)

val resultantDF=mapRDD.map{value=>
      val list=value.values.toList
      (list(0),list(1),list(2))
      }.toDF(columns:_*)

resultantDF.show()

The output is :

+-----+-------+-----+
|empid|empName|depId|
+-----+-------+-----+
|   12|  Rohan|  201|
|   13|   Ross|  201|
|   14|Richard|  401|
|   15|Michale|  501|
|   16|   John|  701|
+-----+-------+-----+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM