I have a Dataframe with the following schema:
root
|-- id: string (nullable = true)
|-- scoreMap: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- scores: struct (nullable = true)
| | | |-- SCORE1: double (nullable = true)
| | | |-- SCORE2: double (nullable = true)
| | | |-- SCORE3: double (nullable = true)
| | |-- combinedScore: double (nullable = true)
Sample data:
id scoreMap
id1 Map(key1 -> [[1.0, 3.2, 2.22], 2.42], key2 -> [[3.0, 3.2, 1.2], 4.42])
id2 Map(key3 -> [[1.0, 3.2, 2.22], 3.1], key3 -> [[3.0, 3.2, 1.2], 2.42])
I want to 1). transform the scoreMap
column to a list, 2). sort (desc) the list by combinedScore
, 3). add the index of each element in the sorted list to the element. For the given example, the result should be:
id scoreList
id1 List([key2, [3.0, 3.2, 1.2], 4.42, 0], [key1,[1.0, 3.2, 2.22], 2.42, 1]])
id2 List([key3, [1.0, 3.2, 2.22], 3.1, 0], [key3, [3.0, 3.2, 1.2], 2.42, 1])
How can I accomplish this?
you can do something like this:
import sqlContext.implicits._
import org.apache.spark.sql.functions.udf
val mapToSortedList: Map[String,Scores] => List[(String,Scores)] = _.toList.sortBy(scores=>scores.combinedScore)
val mapToListUDF = udf(mapToSortedList)
val newDF = dF.withColumn("scoreMap",mapToListUDF('scoreMap))
My answer did not include the added index part. not sure how to achieve it without writing complex code (create new List type with custom sorting that adds the sort index to each element)
I hope this helps at least as a start point
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.