![](/img/trans.png)
[英]how to get the column name of the greatest value by using greatest function postgresql?
[英]how to get greatest() column value and name from list[str] containing column names in pyspark sql?
我有以下示例代碼:
lst1 = ["a", "b"]
for ls1 in lst1:
new_lst = []
lst2 = ["d", "e", "f"]
for ls2 in lst2:
new_lst.append(ls1 + ls2)
df = (df.withColumn("final_" + ls1, greatest(*new_lst))
這是從每個循環中的列列表中獲取最大值的正確方法嗎? 還有什么方法可以獲取相應的列名嗎?
例子:
輸入 df =>
ad ae af bd be bf cd ce cf
--------------------------------------------------
10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36
預計 output df =>
ad ae af final_a bd be bf final_b
------------------------------------------------------
10 11 12 12 13 14 15 15
19 20 21 21 22 23 24 24
28 29 30 30 31 32 33 33
謝謝!
如果您想獲得具有相同前綴(如“a”、“b”、“c”)的列的最大值,那么這將起作用。
from pyspark.sql import functions as f
columns = df.columns
prefixs = set(map(lambda c: c[0], columns))
for prefix in prefixs:
df = df.withColumn('final_' + prefix, f.array_max(f.array(*[f.col(c) for c in columns if c.startswith(prefix)])))
df.show()
+---+---+---+---+---+---+---+---+---+-------+-------+-------+
| ad| ae| af| bd| be| bf| cd| ce| cf|final_c|final_a|final_b|
+---+---+---+---+---+---+---+---+---+-------+-------+-------+
| 10| 11| 12| 13| 14| 15| 16| 17| 18| 18| 12| 15|
| 19| 20| 21| 22| 23| 24| 25| 26| 27| 27| 21| 24|
| 28| 29| 30| 31| 32| 33| 34| 35| 36| 36| 30| 33|
+---+---+---+---+---+---+---+---+---+-------+-------+-------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.