简体   繁体   English

pySpark withColumn 有两个条件

[英]pySpark withColumn with two conditions

I want to filter for two conditions: clean_reference.Output == " " and clean_reference.Primary == "DEFAULT" .我想过滤两个条件: clean_reference.Output == " "clean_reference.Primary == "DEFAULT" If both conditions apply, then clean_reference.Output else "NI"如果两个条件都适用,则clean_reference.Output否则为"NI"

The code below is not accepting my clean_reference.Output as my when() value.下面的代码不接受我的clean_reference.Output作为我的 when() 值。

final_reference = clean_reference.withColumn("Output",f.when(clean_reference.Output == " ")| (clean_reference.Primary == "DEFAULT"), clean_reference.Output).otherwise("NI")
TypeError: when() missing 1 required positional argument: 'value'

Put your cols like f.col() and value to assign as f.lit() .把你的 cols 像f.col()和 value 分配为f.lit()

final_reference = clean_reference.withColumn("Output",\
                       f.when((f.col("Output") == " ")|                              
                             (f.col("Primary") ==\
                              "DEFAULT"), f.col("Output"))\
                                             .otherwise(f.lit("NI")))

same code, just fixed the braces.相同的代码,只是固定了大括号。

final_reference = clean_reference.withColumn(
        "OutputItemNameByValue",
        f.when( 
          (clean_reference.OutputItemNameByValue == " ") | 
          (clean_reference.PrimaryLookupAttributeValue == "TRIANA_DEFAULT"),
          clean_reference.OutputItemNameByValue
        ).otherwise("Not Implemented")
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM