如何用下一行同一列中的值替换 pyspark df 中的任何 null

Question

Let's say I have a pyspark DF:假设我有一个 pyspark DF：

| Column A | Column B |
| -------- | -------- |
| val1     | val1B    |
| null     | val2B    |
| val2     | null     |
| val3     | val3B    |

Can someone help me with replacing any null value in any column (for the whole df) with the value right below it?有人可以帮我用它正下方的值替换任何列（对于整个 df）中的任何 null 值吗？ So the final table should look like this:所以决赛桌应该是这样的：

Column A A列	Column B B列
val1 val1	val1B val1B
val2值2	val2B val2B
val2值2	val3B val3B
val3值3	val3B val3B

How could this be done?这怎么可能呢？ Can I get a code demo if possible?如果可能，我可以获得代码演示吗？ Thank you!谢谢！

All I've really gotten through is counting all the row nums and creating a condition to find the row nums with all of the null values.我真正完成的是计算所有行号并创建一个条件来查找具有所有 null 值的行号。 So I'm left with a table like this:所以我剩下一张这样的桌子：

Column A A列	Column B B列	row_num行号
null null	val2B val2B	2 2个
val2值2	null null	3 3个

But I don't think this step is needed.但我认为不需要这一步。 I'm stuck as to what to do.我不知道该怎么做。

Answer 1

Use list squares to coalesce each column with the lead window function. Code below使用列表方块将每一列与前导 window function 合并。代码如下

df.select(*[coalesce(col(x),lead(x).over(Window.partitionBy().orderBy( monotonically_increasing_id()))).alias(x) for x in df.columns]).show()


+--------+--------+
|Column A|Column B|
+--------+--------+
|    val1|   val1B|
|    val2|   val2B|
|    val2|   val3B|
|    val3|   val3B|
+--------+--------+

如何用下一行同一列中的值替换 pyspark df 中的任何 null

问题描述

1 个解决方案

解决方案1
0 2023-01-06 21:40:48

如何用下一行同一列中的值替换 pyspark df 中的任何 null

问题描述

1 个解决方案

解决方案1 0 2023-01-06 21:40:48

解决方案1
0 2023-01-06 21:40:48