简体   繁体   English

希望基于 Array(Float) 类型的另一列创建“rank arrays”列

[英]Looking to create column of "rank arrays" based on another column of Array(Float) type

Here is my dataset:这是我的数据集:

score  
[0.3, 0.5]
[0.1, 0.6, 0.7]

Desired Dataset:所需数据集:

score            rank 
[0.3, 0.5]      [1, 2]
[0.1, 0.6, 0.7] [1, 2, 3]

This is my initial attempt:这是我最初的尝试:

df_upd = df.withColumn("rank", F.array([F.lit(i) for i in range(1, F.size("score") + 1)]))

I get this error:我收到此错误:

TypeError: range() integer end argument expected, got Column.类型错误:范围() integer 预期结束参数,得到列。

I'm wondering if there are any concise ways to do this or will I have to explode df and then create a rank column using Window functions我想知道是否有任何简洁的方法可以做到这一点,或者我是否必须分解df然后使用 Window 函数创建一个排名列

It looks like you want just create a sequence from 1 to size(score) , you can use sequence function for that:看起来您只想创建一个从1size(score)的序列,您可以为此使用sequence function :

from pyspark.sql import functions as F

df = spark.createDataFrame([([0.3, 0.5],), ([0.1, 0.6, 0.7],)], ["score"])

df.withColumn("rank", F.expr("sequence(1, size(score))")).show()

#+---------------+---------+
#|          score|     rank|
#+---------------+---------+
#|     [0.3, 0.5]|   [1, 2]|
#|[0.1, 0.6, 0.7]|[1, 2, 3]|
#+---------------+---------+ 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于另一个排名列将行堆叠为列 - stack rows as columns based on another rank column 如何迭代密集秩的数据集列以在 Scala 中创建另一列的数组? - How to Iterate Dataset column of dense rank to create Array of another column in Scala? 根据另一列的排名将列添加到R中的数据框 - adding a column to a data frame in R based on the rank of another column 如何基于另一列在R数据框中对列进行排名 - How to rank column in r data frame based on another column 根据多列值标准创建排名列 - Create Rank column based on multiple column value criterion Pandas - 通过在另一列中查找列的值来创建列 - Pandas - Create a column by looking up the value of a column in another column 如何根据pandas中的另一列数组对一列数组进行排序? - How to sort a column of arrays based on another column of arrays in pandas? Python熊猫的排名/排序基于另一列,每列输入均不同 - Python pandas rank/sort based on another column that differs for each input Pandas:在 dataframe 中创建列,并通过查看另一个 dataframe 为该列分配值 - Pandas: Create column in dataframe and assign value to the column by looking into another dataframe 根据 R 中另一列中的排序创建列 - Create column based on ordering in another column in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM