[英]Looking to create column of "rank arrays" based on another column of Array(Float) type
Here is my dataset:这是我的数据集:
score
[0.3, 0.5]
[0.1, 0.6, 0.7]
Desired Dataset:所需数据集:
score rank
[0.3, 0.5] [1, 2]
[0.1, 0.6, 0.7] [1, 2, 3]
This is my initial attempt:这是我最初的尝试:
df_upd = df.withColumn("rank", F.array([F.lit(i) for i in range(1, F.size("score") + 1)]))
I get this error:我收到此错误:
TypeError: range() integer end argument expected, got Column.类型错误:范围() integer 预期结束参数,得到列。
I'm wondering if there are any concise ways to do this or will I have to explode df
and then create a rank column using Window functions我想知道是否有任何简洁的方法可以做到这一点,或者我是否必须分解df
然后使用 Window 函数创建一个排名列
It looks like you want just create a sequence from 1
to size(score)
, you can use sequence
function for that:看起来您只想创建一个从1
到size(score)
的序列,您可以为此使用sequence
function :
from pyspark.sql import functions as F
df = spark.createDataFrame([([0.3, 0.5],), ([0.1, 0.6, 0.7],)], ["score"])
df.withColumn("rank", F.expr("sequence(1, size(score))")).show()
#+---------------+---------+
#| score| rank|
#+---------------+---------+
#| [0.3, 0.5]| [1, 2]|
#|[0.1, 0.6, 0.7]|[1, 2, 3]|
#+---------------+---------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.