简体   繁体   中英

How to convert a Scala Spark Dataframe to LinkedHashMap[String, String]

Below is my dataframe:

val myDF= spark.sql("select company, comp_id from my_db.my_table")
myDF: org.apache.spark.sql.DataFrame = [company: string, comp_id: string]

And the data looks like

+----------+---------+
|  company |comp_id  |
+----------+---------+
|macys     |     101 |
|jcpenny   |     102 |
|kohls     |     103 |
|star bucks|     104 |
|macy's    |     105 |
+----------+---------+

I'm trying to create a Map collection object (like below) in Scala from the above dataframe.

Map("macys" -> "101", "jcpenny" -> "102" ..., "macy's" -> "105")

Questions:
1)Will the sequence of the dataframe records match with the sequence of the content in the original file sitting under the table?
2)If I do a collect() on the dataframe, will the sequence of the array being created match with the sequence of the content in the original file?
Explanation: When i do a df.collect().map(t => t(0) -> t(1)).toMap , looks like the map collection object doesn't preserve the insertion order, which is also the default behaviour of a scala map.
res01: scala.collection.immutable.Map[Any,Any] = Map(kohls -> 103, jcpenny -> 102 ...)
3)So, How to convert the dataframe into one of the scala's collection map objects which actually preserves the insertion order/record sequence.
Explanation: As LinkedHashMap is one of the scala map collection object types to ensure insertion order. I'm trying to find a way to convert the dataframe into a LinkedHashMap object.

You can use LinkedHashMap, from Scaladoc page:

"This class implements mutable maps using a hashtable. The iterator and all traversal methods of this class visit elements in the order they were inserted."

But the Dataframes does not guarantee the order will always be the same.

import collection.mutable.LinkedHashMap
var myMap = LinkedHashMap[String, String]()

myDF.collect().map(t => myMap += (t(0).toString -> t(1).toString))

when you print myMap

res01: scala.collection.mutable.LinkedHashMap[String,String] = Map(macys -> 101, ..)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM