繁体   English   中英

Pyspark - withColumn + when with variable give "Method or([class java.lang.Boolean]) does not exist"

[英]Pyspark - withColumn + when with variable give "Method or([class java.lang.Boolean]) does not exist"

我需要根据其他列之一和变量值(此处表示为otherThing )向数据框添加一列,见下文:

otherThing = "test"
dataDF = spark.createDataFrame([(66, "a", "4"), 
                                (67, "a", "0"), 
                                (70, "b", "4"), 
                                (71, "d", "4")],
                                ("id", "code", "amt"))
#this works fine
dataDF.withColumn("new_column", when((dataDF["id"] <= 70), "A").otherwise("B")).display() 
#this gives me error
dataDF.withColumn("new_column", when((dataDF["id"] <= 70) | (otherThing == ""), "A").otherwise("B")).display()

这将返回以下错误:Method or([class java.lang.Boolean]) does not exist 在示例otherThing是常量,但在实际场景中它可以有不同的值

问题是由于变量缺少lit

https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.lit.html https://sparkbyexamples.com/pysplit-add-park/pyspark -不变/

工作代码:

import pyspark.sql.functions as F
otherThing = ""
dataDF = spark.createDataFrame([(66, "a", "4"), 
                                (67, "a", "0"), 
                                (70, "b", "4"), 
                                (71, "d", "4")],
                                ("id", "code", "amt"))
dataDF.withColumn("new_column", when((dataDF["id"] <= 70) | F.lit(otherThing == ""), "A").otherwise("B")).display()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM