簡體   English   中英

Pyspark - withColumn + when with variable give "Method or([class java.lang.Boolean]) does not exist"

[英]Pyspark - withColumn + when with variable give "Method or([class java.lang.Boolean]) does not exist"

我需要根據其他列之一和變量值(此處表示為otherThing )向數據框添加一列,見下文:

otherThing = "test"
dataDF = spark.createDataFrame([(66, "a", "4"), 
                                (67, "a", "0"), 
                                (70, "b", "4"), 
                                (71, "d", "4")],
                                ("id", "code", "amt"))
#this works fine
dataDF.withColumn("new_column", when((dataDF["id"] <= 70), "A").otherwise("B")).display() 
#this gives me error
dataDF.withColumn("new_column", when((dataDF["id"] <= 70) | (otherThing == ""), "A").otherwise("B")).display()

這將返回以下錯誤:Method or([class java.lang.Boolean]) does not exist 在示例otherThing是常量,但在實際場景中它可以有不同的值

問題是由於變量缺少lit

https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.lit.html https://sparkbyexamples.com/pysplit-add-park/pyspark -不變/

工作代碼:

import pyspark.sql.functions as F
otherThing = ""
dataDF = spark.createDataFrame([(66, "a", "4"), 
                                (67, "a", "0"), 
                                (70, "b", "4"), 
                                (71, "d", "4")],
                                ("id", "code", "amt"))
dataDF.withColumn("new_column", when((dataDF["id"] <= 70) | F.lit(otherThing == ""), "A").otherwise("B")).display()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM