简体   繁体   English

如何用下一行同一列中的值替换 pyspark df 中的任何 null

[英]How to replace any null in pyspark df with value from the below row, same column

Let's say I have a pyspark DF:假设我有一个 pyspark DF:

| Column A | Column B |
| -------- | -------- |
| val1     | val1B    |
| null     | val2B    |
| val2     | null     |
| val3     | val3B    |

Can someone help me with replacing any null value in any column (for the whole df) with the value right below it?有人可以帮我用它正下方的值替换任何列(对于整个 df)中的任何 null 值吗? So the final table should look like this:所以决赛桌应该是这样的:

Column A A列 Column B B列
val1 val1 val1B val1B
val2值2 val2B val2B
val2值2 val3B val3B
val3值3 val3B val3B

How could this be done?这怎么可能呢? Can I get a code demo if possible?如果可能,我可以获得代码演示吗? Thank you!谢谢!

All I've really gotten through is counting all the row nums and creating a condition to find the row nums with all of the null values.我真正完成的是计算所有行号并创建一个条件来查找具有所有 null 值的行号。 So I'm left with a table like this:所以我剩下一张这样的桌子:

Column A A列 Column B B列 row_num行号
null null val2B val2B 2 2个
val2值2 null null 3 3个

But I don't think this step is needed.但我认为不需要这一步。 I'm stuck as to what to do.我不知道该怎么做。

Use list squares to coalesce each column with the lead window function. Code below使用列表方块将每一列与前导 window function 合并。代码如下

df.select(*[coalesce(col(x),lead(x).over(Window.partitionBy().orderBy( monotonically_increasing_id()))).alias(x) for x in df.columns]).show()


+--------+--------+
|Column A|Column B|
+--------+--------+
|    val1|   val1B|
|    val2|   val2B|
|    val2|   val3B|
|    val3|   val3B|
+--------+--------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pyspark:用另一列同名替换行值 - Pyspark: replace row value by another column with the same name 如何遍历 Pandas DF 中的列以检查某个值并返回同一行但来自不同列的值? - How to iterate over a column in a Pandas DF to check for a certain value and return a value in the same row but from a different column? 如何根据 Pyspark 中同一列中的最大值替换列中的值? - How to replace value in a column based on maximum value in same column in Pyspark? PySpark 用其他列中的值替换列中的空值 - PySpark replace null in column with value in other column 替换同一列中上一行的值 - Replace value from previous row in same column 如何将 pandas dataframe 中的 null 值替换为 Z6A55075B3CDF4754 中同一列中的非空值? - How to replace a null value from a pandas dataframe with a non-null value from the same column in the dataframe? 如何用同一列中的值填充 null 列中的 Pyspark Dataframe 值,其在另一列中的对应值相同 - How to fill null values in a Pyspark Dataframe column with values from the same column, whose corresponding value in another column is same 将值替换为 Pyspark 中上列的值 - Replace value with value from column above in Pyspark PySpark 从 DF List 对象值中获取相关行 - PySpark get related Row from DF List object value 根据列值从 df 访问一行 - Access a row from a df based on a column value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM