scala：從變量列列表中獲取與最大列值對應的列名

Question

我在 databricks 筆記本中有以下工作解決方案作為測試。

var maxcol =  udf((col1: Long, col2: Long, col3: Long) => {
var res = ""
  if (col1 > col2 && col1 > col3) res = "col1"
  else if (col2 > col1 && col2 > col3) res = "col2"
  else res = "col3"
  res
})

val someDF = Seq(
  (8, 10, 12, "bat"),
  (64, 61, 59, "mouse"),
  (-27, -30, -15, "horse")
).toDF("number1", "number2", "number3", "word")
.withColumn("maxColVal", greatest("number1", "number2", "number3"))
.withColumn("maxColVal_Name", maxcol(col("number1"), col("number2"), col("number3")))

display(someDF)

有沒有辦法讓這個通用？ 我有一個用例來使變量列傳遞給這個 UDF，並且仍然獲得最大列名稱為 output 對應於具有最大值的列。 與上面我在 UDF 中硬編碼列名 'col1'、'col2' 和 'col3' 的情況不同。

Answer 1

下面使用：

    val df = List((1,2,3,5,"a"),(4,2,3,1,"a"),(1,20,3,1,"a"),(1,22,22,2,"a")).toDF("mycol1","mycol2","mycol3","mycol4","mycol5")

//list all your columns among which you want to find the max value
    val colGroup = List(df("mycol1"),df("mycol2"),df("mycol3"),df("mycol4"))

//list column value -> column name of the columns among which you want to find max value column NAME
    val colGroupMap = List(df("mycol1"),lit("mycol1"),
df("mycol2"),lit("mycol2"),
df("mycol3"),lit("mycol3"),
df("mycol4"),lit("mycol4"))

    var maxcol =  udf((colVal: Map[Int,String]) => {
      colVal.max._2  //you can easily find the column name of the max column value
    })

    df.withColumn("maxColValue",greatest(colGroup:_*)).withColumn("maxColVal_Name",maxcol(map(colGroupMap:_*))).show(false)

    +------+------+------+------+------+-----------+--------------+
    |mycol1|mycol2|mycol3|mycol4|mycol5|maxColValue|maxColVal_Name|
    +------+------+------+------+------+-----------+--------------+
    |1     |2     |3     |5     |a     |5          |mycol4        |
    |4     |2     |3     |1     |a     |4          |mycol1        |
    |1     |20    |3     |1     |a     |20         |mycol2        |
    |1     |22    |22    |2     |a     |22         |mycol3        |
    +------+------+------+------+------+-----------+--------------+

scala：從變量列列表中獲取與最大列值對應的列名

問題描述

1 個解決方案

解決方案1
2 已采納 2020-06-09 02:17:14

scala：從變量列列表中獲取與最大列值對應的列名

問題描述

1 個解決方案

解決方案1 2 已采納 2020-06-09 02:17:14

解決方案1
2 已采納 2020-06-09 02:17:14