[英]Pyspark - withColumn + when with variable give "Method or([class java.lang.Boolean]) does not exist"
我需要根據其他列之一和變量值(此處表示為otherThing
)向數據框添加一列,見下文:
otherThing = "test"
dataDF = spark.createDataFrame([(66, "a", "4"),
(67, "a", "0"),
(70, "b", "4"),
(71, "d", "4")],
("id", "code", "amt"))
#this works fine
dataDF.withColumn("new_column", when((dataDF["id"] <= 70), "A").otherwise("B")).display()
#this gives me error
dataDF.withColumn("new_column", when((dataDF["id"] <= 70) | (otherThing == ""), "A").otherwise("B")).display()
這將返回以下錯誤:Method or([class java.lang.Boolean]) does not exist 在示例otherThing
是常量,但在實際場景中它可以有不同的值
問題是由於變量缺少lit
https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.lit.html https://sparkbyexamples.com/pysplit-add-park/pyspark -不變/
工作代碼:
import pyspark.sql.functions as F
otherThing = ""
dataDF = spark.createDataFrame([(66, "a", "4"),
(67, "a", "0"),
(70, "b", "4"),
(71, "d", "4")],
("id", "code", "amt"))
dataDF.withColumn("new_column", when((dataDF["id"] <= 70) | F.lit(otherThing == ""), "A").otherwise("B")).display()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.