Spark dataframe: transform map with StructType value to a sorted list

Question

I have a Dataframe with the following schema:

root
 |-- id: string (nullable = true)
 |-- scoreMap: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- scores: struct (nullable = true)
 |    |    |    |-- SCORE1: double (nullable = true)
 |    |    |    |-- SCORE2: double (nullable = true)
 |    |    |    |-- SCORE3: double (nullable = true)
 |    |    |-- combinedScore: double (nullable = true)

Sample data:

id   scoreMap
id1   Map(key1 -> [[1.0, 3.2, 2.22], 2.42],   key2 -> [[3.0, 3.2, 1.2], 4.42])
id2   Map(key3 -> [[1.0, 3.2, 2.22], 3.1],   key3 -> [[3.0, 3.2, 1.2], 2.42])

I want to 1). transform the scoreMap column to a list, 2). sort (desc) the list by combinedScore , 3). add the index of each element in the sorted list to the element. For the given example, the result should be:

id   scoreList
id1   List([key2, [3.0, 3.2, 1.2], 4.42, 0], [key1,[1.0, 3.2, 2.22], 2.42, 1]])
id2   List([key3, [1.0, 3.2, 2.22], 3.1, 0],   [key3, [3.0, 3.2, 1.2], 2.42, 1])

How can I accomplish this?

Answer 1

you can do something like this:

import sqlContext.implicits._
import org.apache.spark.sql.functions.udf
val mapToSortedList: Map[String,Scores] => List[(String,Scores)] = _.toList.sortBy(scores=>scores.combinedScore)
val mapToListUDF = udf(mapToSortedList)
val newDF = dF.withColumn("scoreMap",mapToListUDF('scoreMap))

My answer did not include the added index part. not sure how to achieve it without writing complex code (create new List type with custom sorting that adds the sort index to each element)

I hope this helps at least as a start point

Spark dataframe: transform map with StructType value to a sorted list

Question

1 answers

solution1
0 2016-08-29 08:04:08

Spark dataframe: transform map with StructType value to a sorted list

Question

1 answers

solution1 0 2016-08-29 08:04:08

solution1
0 2016-08-29 08:04:08