简体   繁体   中英

Spark dataframe: transform map with StructType value to a sorted list

I have a Dataframe with the following schema:

root
 |-- id: string (nullable = true)
 |-- scoreMap: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- scores: struct (nullable = true)
 |    |    |    |-- SCORE1: double (nullable = true)
 |    |    |    |-- SCORE2: double (nullable = true)
 |    |    |    |-- SCORE3: double (nullable = true)
 |    |    |-- combinedScore: double (nullable = true)

Sample data:

id   scoreMap
id1   Map(key1 -> [[1.0, 3.2, 2.22], 2.42],   key2 -> [[3.0, 3.2, 1.2], 4.42])
id2   Map(key3 -> [[1.0, 3.2, 2.22], 3.1],   key3 -> [[3.0, 3.2, 1.2], 2.42])

I want to 1). transform the scoreMap column to a list, 2). sort (desc) the list by combinedScore , 3). add the index of each element in the sorted list to the element. For the given example, the result should be:

id   scoreList
id1   List([key2, [3.0, 3.2, 1.2], 4.42, 0], [key1,[1.0, 3.2, 2.22], 2.42, 1]])
id2   List([key3, [1.0, 3.2, 2.22], 3.1, 0],   [key3, [3.0, 3.2, 1.2], 2.42, 1])

How can I accomplish this?

you can do something like this:

import sqlContext.implicits._
import org.apache.spark.sql.functions.udf
val mapToSortedList: Map[String,Scores] => List[(String,Scores)] = _.toList.sortBy(scores=>scores.combinedScore)
val mapToListUDF = udf(mapToSortedList)
val newDF = dF.withColumn("scoreMap",mapToListUDF('scoreMap))

My answer did not include the added index part. not sure how to achieve it without writing complex code (create new List type with custom sorting that adds the sort index to each element)

I hope this helps at least as a start point

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM