简体   繁体   English

具有先前行值的新列

[英]New column with previous rows value

Im working with pyspark and i have frame like this 我和pyspark一起工作,我有这样的框架

this is my frame 这是我的框架

+---+-----+
| id|value|
+---+-----+
|  1|   65|
|  2|   66|
|  3|   65|
|  4|   68|
|  5|   71|
+---+-----+

and i want to generate frame with pyspark like this 我想像这样用pyspark生成框架

+---+-----+-------------+
| id|value| prev_value  |
+---+-----+-------------+
| 1 | 65  | null        |
| 2 | 66  | 65          |
| 3 | 65  | 66,65       |
| 4 | 68  | 65,66,65    |
| 5 | 71  | 68,65,66,65 |
+---+-----+-------------+

Here is one way: 这是一种方式:

from pyspark.sql.window import Window
from pyspark.sql.types import StringType

# define window and calculate "running total" of lagged value
win = Window.partitionBy().orderBy(f.col('id'))
df = df.withColumn('prev_value', f.collect_list(f.lag('value').over(win)).over(win))

# now define udf to concatenate the lists
concat = f.udf(lambda x: 'null' if len(x)==0 else ','.join([str(elt) for elt in x[::-1]]))
df = df.withColumn('prev_value', concat('prev_value'))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas,具有先前行的最小值的新列 - Python Pandas, New column with minimum value of previous rows Python 基于先前行的新列的条件值 - Python conditional value for new column based on previous rows 创建新的 dataframe 列并根据同一列的先前行值生成值 - Create new dataframe column and generate values depending on the previous rows value of this same column 如何从当前列中减去前一列,并使用numpy使用该值在数组中创建新维? - How to subtract the previous rows column from current column and create a new dimension in the array with this value using numpy? 如何将一列的值除以前一行的列(而不是同一列),并将结果作为新维度添加到numpy数组中? - How to divide one columns value by a previous rows column(not the same column) and add the result as a new dimension to a numpy array? 创建一个前N行的新列作为数组 - Create a new Column of previous N rows as an Array 使用前面的行创建一个新列,pandas - Create a new column using the previous rows , pandas 如何遍历行并将基于上一行的值插入新列 - How to Iterate over rows and insert a value based on previous row to a new column 有没有办法根据现有列的当前和前一行值在每行的新列中分配增量值的数量? - Is there a way to assign number of incremental value in a new column of each rows based current and previous row values of existing columns? 根据之前的行在Pandas中生成列值 - Generate Column Value in Pandas based on previous rows
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM