简体   繁体   English

使用 PySpark 对多列执行 Lag

[英]Perform Lag over multiple columns using PySpark

I'm fairly new to PySpark, but I am trying to use best practices in my code.我对 PySpark 相当陌生,但我正在尝试在我的代码中使用最佳实践。 I have a PySpark dataframe and I would like to lag multiple columns, replacing the original values with the lagged values.我有一个 PySpark dataframe 并且我想滞后多个列,用滞后值替换原始值。 Example:例子:

ID     date        value1     value2     value3
1      2021-12-23  1.1        4.0        2.2
2      2021-12-21  2.4        1.6        11.9
1      2021-12-24  5.4        3.2        7.8
2      2021-12-22  4.2        1.4        9.0
1      2021-12-26  2.3        5.2        7.6
.
.
.

I'd like to take all values according to ID , order them by date , then lag the values by some amount.我想根据ID获取所有值,按date排序,然后将值滞后一些。 The code I have so far:我到目前为止的代码:

from pyspark.sql import functions as F, Window

window = Window.partitionBy(F.col("ID")).orderBy(F.col("date"))

valueColumns = ['value1', 'value2', 'value3']

df = F.lag(valueColumns, offset=shiftAmount).over(window)

My desired output would be:我想要的 output 将是:

ID     date        value1     value2     value3
1      2021-12-23  Null       Null       Null
2      2021-12-21  Null       Null       Null
1      2021-12-24  1.1        4.0        2.2
2      2021-12-22  2.4        1.6        11.9
1      2021-12-26  5.4        3.2        7.86
.
.
.

The problem I'm having is that, from what I can find, F.lag only accepts a single column.我遇到的问题是,据我所知, F.lag只接受一列。 I'm looking for suggestions on how to best accomplish this.我正在寻找有关如何最好地完成此任务的建议。 I suppose I could use a for loop to just append shifted columns or something, but this seems pretty inelegant.我想我可以使用 for 循环来仅 append 移动列或其他东西,但这似乎很不雅。 Thanks!谢谢!

A simple list comprehension on column names should do the job:对列名的简单列表理解应该可以完成这项工作:

df = df.select(
    "ID", "date",
    *[F.lag(c, offset=shiftAmount).over(window).alias(c) for c in valueColumns]
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM