简体   繁体   中英

Converting a Spark Dataframe to a mutable Map

I am new to spark and scala. I am trying to query a table in hive(select 2 columns from the table) and convert the resulting dataframe into a Map. I am using Spark 1.6 with Scala 2.10.6.

Ex:

Dataframe:
+--------+-------+
| address| exists|
+--------+-------+
|address1|   1   |
|address2|   0   |
|address3|   1   |
+--------+-------+ 
should be converted to: Map("address1" -> 1, "address2" -> 0, "address3" -> 1)

This is the code I am using:

val testMap: scala.collection.mutable.Map[String,Any] = Map()
val df= hiveContext.sql("select address,exists from testTable")
qualys.foreach( r => {
  val key = r(0).toString
  val value = r(1)
  testMap+=(key -> value)
  }
)
testMap.foreach(println)

When I run the above code, I get this error:

java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;

It is throwing this error at the line where I am trying to add the key value pair to the Map. ie testMap+=(key -> value)

I know that there is a better and simpler way of doing this using the org.apache.spark.sql.functions.map . However, I am using Spark 1.6 and I don't think this function is available. I tried doing the import and I didn't find it in the list of available functions.

why is my approach giving me an error? and is there a better/elegant way of achieving this with spark 1.6?

any help would be appreciated. Thank you!

UPDATE:

I changed the way the elements are being added to the Map to the following: testMap.put(key, value) .

I was previously using the += for adding the elements. Now i don't get the java.lang.NoSuchMethodError anymore. However, no elements are getting added to the testMap . After the foreach step is complete, I tried to print the size of the map and all the elements in it and I see that there are zero elements.

Why are the elements not getting added? I am also open to any other better approach. Thank you!!

This can be broken down into 3 steps, each one already solved on SO:

  1. Convert DataFrame to RDD[(String, Int)]
  2. Call collectAsMap() on that RDD to get an immutable map
  3. Convert that map into a mutable one (eg as described here )

NOTE : I don't know why you need a mutable map - it's worth noting that using a mutable collection rarely makes much sense in Scala. Sticking with immutable objects only is safer and easier to reason about. "Forgetting" about the existence of mutable collections makes learning functional APIs (like Spark's!) much easier.

simply you can collect the data from dataframe and iterate on top of it, it will work

qualys.collect.map( r => {
val key = r(0).toString
val value = r(1)
testMap+=(key -> value)
 }
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM