Below is my dataframe:
val myDF= spark.sql("select company, comp_id from my_db.my_table")
myDF: org.apache.spark.sql.DataFrame = [company: string, comp_id: string]
And the data looks like
+----------+---------+
| company |comp_id |
+----------+---------+
|macys | 101 |
|jcpenny | 102 |
|kohls | 103 |
|star bucks| 104 |
|macy's | 105 |
+----------+---------+
I'm trying to create a Map
collection object (like below) in Scala from the above dataframe.
Map("macys" -> "101", "jcpenny" -> "102" ..., "macy's" -> "105")
Questions:
1)Will the sequence of the dataframe records match with the sequence of the content in the original file sitting under the table?
2)If I do a collect()
on the dataframe, will the sequence of the array being created match with the sequence of the content in the original file?
Explanation: When i do a df.collect().map(t => t(0) -> t(1)).toMap
, looks like the map collection object doesn't preserve the insertion order, which is also the default behaviour of a scala map.
res01: scala.collection.immutable.Map[Any,Any] = Map(kohls -> 103, jcpenny -> 102 ...)
3)So, How to convert the dataframe into one of the scala's collection map objects which actually preserves the insertion order/record sequence.
Explanation: As LinkedHashMap
is one of the scala map collection object types to ensure insertion order. I'm trying to find a way to convert the dataframe into a LinkedHashMap
object.
You can use LinkedHashMap, from Scaladoc page:
"This class implements mutable maps using a hashtable. The iterator and all traversal methods of this class visit elements in the order they were inserted."
But the Dataframes does not guarantee the order will always be the same.
import collection.mutable.LinkedHashMap
var myMap = LinkedHashMap[String, String]()
myDF.collect().map(t => myMap += (t(0).toString -> t(1).toString))
when you print myMap
res01: scala.collection.mutable.LinkedHashMap[String,String] = Map(macys -> 101, ..)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.