简体   繁体   中英

Converting a Spark Dataframe to a Scala Map collection

I'm trying to find the best solution to convert an entire Spark dataframe to a scala Map collection. It is best illustrated as follows:

To go from this (in the Spark examples):

val df = sqlContext.read.json("examples/src/main/resources/people.json")

df.show
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+

To a Scala collection (Map of Maps) represented like this:

val people = Map(
Map("age" -> null, "name" -> "Michael"),
Map("age" -> 30, "name" -> "Andy"),
Map("age" -> 19, "name" -> "Justin")
)

I don't think your question makes sense -- your outermost Map , I only see you are trying to stuff values into it -- you need to have key / value pairs in your outermost Map . That being said:

val peopleArray = df.collect.map(r => Map(df.columns.zip(r.toSeq):_*))

Will give you:

Array(
  Map("age" -> null, "name" -> "Michael"),
  Map("age" -> 30, "name" -> "Andy"),
  Map("age" -> 19, "name" -> "Justin")
)

At that point you could do:

val people = Map(peopleArray.map(p => (p.getOrElse("name", null), p)):_*)

Which would give you:

Map(
  ("Michael" -> Map("age" -> null, "name" -> "Michael")),
  ("Andy" -> Map("age" -> 30, "name" -> "Andy")),
  ("Justin" -> Map("age" -> 19, "name" -> "Justin"))
)

I'm guessing this is really more what you want. If you wanted to key them on an arbitrary Long index, you can do:

val indexedPeople = Map(peopleArray.zipWithIndex.map(r => (r._2, r._1)):_*)

Which gives you:

Map(
  (0 -> Map("age" -> null, "name" -> "Michael")),
  (1 -> Map("age" -> 30, "name" -> "Andy")),
  (2 -> Map("age" -> 19, "name" -> "Justin"))
)

First get the schema from Dataframe

val schemaList = dataframe.schema.map(_.name).zipWithIndex//get schema list from dataframe

Get the rdd from dataframe and mapping with it

dataframe.rdd.map(row =>
  //here rec._1 is column name and rce._2 index
  schemaList.map(rec => (rec._1, row(rec._2))).toMap
 ).collect.foreach(println)
val map =df.collect.map(a=>(a(0)->a(1))).toMap.asInstanceOf[Map[String,String]]

if the result is needed in a map instead of array(map)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM