希望基于 Array(Float) 类型的另一列创建“rank arrays”列

Question

Here is my dataset:这是我的数据集：

score  
[0.3, 0.5]
[0.1, 0.6, 0.7]

Desired Dataset:所需数据集：

score            rank 
[0.3, 0.5]      [1, 2]
[0.1, 0.6, 0.7] [1, 2, 3]

This is my initial attempt:这是我最初的尝试：

df_upd = df.withColumn("rank", F.array([F.lit(i) for i in range(1, F.size("score") + 1)]))

I get this error:我收到此错误：

TypeError: range() integer end argument expected, got Column.类型错误：范围（） integer 预期结束参数，得到列。

I'm wondering if there are any concise ways to do this or will I have to explode df and then create a rank column using Window functions我想知道是否有任何简洁的方法可以做到这一点，或者我是否必须分解df然后使用 Window 函数创建一个排名列

Answer 1

It looks like you want just create a sequence from 1 to size(score) , you can use sequence function for that:看起来您只想创建一个从1到size(score)的序列，您可以为此使用sequence function ：

from pyspark.sql import functions as F

df = spark.createDataFrame([([0.3, 0.5],), ([0.1, 0.6, 0.7],)], ["score"])

df.withColumn("rank", F.expr("sequence(1, size(score))")).show()

#+---------------+---------+
#|          score|     rank|
#+---------------+---------+
#|     [0.3, 0.5]|   [1, 2]|
#|[0.1, 0.6, 0.7]|[1, 2, 3]|
#+---------------+---------+

希望基于 Array(Float) 类型的另一列创建“rank arrays”列

问题描述

1 个解决方案

解决方案1
1 2022-01-25 17:52:53

希望基于 Array(Float) 类型的另一列创建“rank arrays”列

问题描述

1 个解决方案

解决方案1 1 2022-01-25 17:52:53

解决方案1
1 2022-01-25 17:52:53