简体   繁体   English

Pyspark:如何使用带有 reduce 的字典链接 Column.when()?

[英]Pyspark: How to chain Column.when() using a dictionary with reduce?

I'm trying to get conditions from a dictionary in a chain of when() functions using reduce() to pass in the end to a dataframe.withColumn() .我试图从when()函数链中的字典中获取条件,使用reduce()最后传递给dataframe.withColumn()

for example:例如:

conditions = {
    "0": (col("a") == 1.0) & (col("b") != 1.0),
    "1": (col("c") == 1.0) & (col("d") == 1.0)
}

using reduce() I implemented this:使用 reduce() 我实现了这个:

when_stats = reduce(lambda key, value: when(conditions[key], lit(key)), conditions)

and later using it in withColumn():然后在 withColumn() 中使用它:

df2 = df1.withColumn(result, when_stats)

The problem is that it only takes the first condition which is "0" and doesn't chain the second one.问题是它只接受第一个条件,即“0”,而不链接第二个条件。 printing 'when_stats' gives me:打印 'when_stats' 给我:

Column<'CASE WHEN ((a = 1.0) AND (NOT (b = 1.0))) THEN 0 END'>

When I add a 3rd condition it throws an error and doesn't work:当我添加第三个条件时,它会抛出错误并且不起作用:

TypeError: unhashable type: 'Column' TypeError:无法散列的类型:'Column'

So the question is, how can I loop through the dictionary and create the complete when().when().when() ... ?所以问题是,我如何遍历字典并创建完整的when().when().when() ...? Is there a better solution specially if I want to have otherwise() in the end?如果我最后想要otherwise()有没有更好的解决方案?

When you use reduce with dict object, you're actually iterating over the keys of the dict.当您将reduce与 dict object 一起使用时,您实际上是在迭代 dict 的键。 So the lambda function takes acc the accumulator and key the actual key being processed.因此 lambda acc接受累加器并key正在处理的实际密钥。

You can use this instead:您可以改用它:

from functools import reduce
from pyspark.sql import functions as F

conditions = {
    "0": (F.col("a") == 1.0) & (F.col("b") != 1.0),
    "1": (F.col("c") == 1.0) & (F.col("d") == 1.0)
}

when_stats = reduce(
    lambda acc, key: acc.when(conditions[key], key),
    conditions,
    F
) #.otherwise("default_value")

print(when_stats)
# Column<'CASE WHEN ((a = 1.0) AND (NOT (b = 1.0))) THEN 0 WHEN ((c = 1.0) AND (d = 1.0)) THEN 1 END'>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM