[英]pySpark withColumn with two conditions
I want to filter for two conditions: clean_reference.Output == " "
and clean_reference.Primary == "DEFAULT"
.我想过滤两个条件:
clean_reference.Output == " "
和clean_reference.Primary == "DEFAULT"
。 If both conditions apply, then clean_reference.Output
else "NI"
如果两个条件都适用,则
clean_reference.Output
否则为"NI"
The code below is not accepting my clean_reference.Output
as my when() value.下面的代码不接受我的
clean_reference.Output
作为我的 when() 值。
final_reference = clean_reference.withColumn("Output",f.when(clean_reference.Output == " ")| (clean_reference.Primary == "DEFAULT"), clean_reference.Output).otherwise("NI")
TypeError: when() missing 1 required positional argument: 'value'
Put your cols like f.col()
and value to assign as f.lit()
.把你的 cols 像
f.col()
和 value 分配为f.lit()
。
final_reference = clean_reference.withColumn("Output",\
f.when((f.col("Output") == " ")|
(f.col("Primary") ==\
"DEFAULT"), f.col("Output"))\
.otherwise(f.lit("NI")))
same code, just fixed the braces.相同的代码,只是固定了大括号。
final_reference = clean_reference.withColumn(
"OutputItemNameByValue",
f.when(
(clean_reference.OutputItemNameByValue == " ") |
(clean_reference.PrimaryLookupAttributeValue == "TRIANA_DEFAULT"),
clean_reference.OutputItemNameByValue
).otherwise("Not Implemented")
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.