I need to add a column to data frame based on the one of the other columns AND a variable value (represented here as otherThing
), see below:
otherThing = "test"
dataDF = spark.createDataFrame([(66, "a", "4"),
(67, "a", "0"),
(70, "b", "4"),
(71, "d", "4")],
("id", "code", "amt"))
#this works fine
dataDF.withColumn("new_column", when((dataDF["id"] <= 70), "A").otherwise("B")).display()
#this gives me error
dataDF.withColumn("new_column", when((dataDF["id"] <= 70) | (otherThing == ""), "A").otherwise("B")).display()
This returns the following error: Method or([class java.lang.Boolean]) does not exist In the example otherThing
is constant, but in real scenario it can have different values
The issue is due to the missing lit
for the variable
https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.functions.lit.html https://sparkbyexamples.com/pyspark/pyspark-lit-add-literal-constant/
working code:
import pyspark.sql.functions as F
otherThing = ""
dataDF = spark.createDataFrame([(66, "a", "4"),
(67, "a", "0"),
(70, "b", "4"),
(71, "d", "4")],
("id", "code", "amt"))
dataDF.withColumn("new_column", when((dataDF["id"] <= 70) | F.lit(otherThing == ""), "A").otherwise("B")).display()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.