简体   繁体   中英

pySpark withColumn with two conditions

I want to filter for two conditions: clean_reference.Output == " " and clean_reference.Primary == "DEFAULT" . If both conditions apply, then clean_reference.Output else "NI"

The code below is not accepting my clean_reference.Output as my when() value.

final_reference = clean_reference.withColumn("Output",f.when(clean_reference.Output == " ")| (clean_reference.Primary == "DEFAULT"), clean_reference.Output).otherwise("NI")
TypeError: when() missing 1 required positional argument: 'value'

Put your cols like f.col() and value to assign as f.lit() .

final_reference = clean_reference.withColumn("Output",\
                       f.when((f.col("Output") == " ")|                              
                             (f.col("Primary") ==\
                              "DEFAULT"), f.col("Output"))\
                                             .otherwise(f.lit("NI")))

same code, just fixed the braces.

final_reference = clean_reference.withColumn(
        "OutputItemNameByValue",
        f.when( 
          (clean_reference.OutputItemNameByValue == " ") | 
          (clean_reference.PrimaryLookupAttributeValue == "TRIANA_DEFAULT"),
          clean_reference.OutputItemNameByValue
        ).otherwise("Not Implemented")
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM