In my pyspark DataFrame I have two columns price1
and price2
. I want to create a new column result
based on the formula ((price1 - price2)/price1)
. However, I want also to check that neither price1
nor price2
are null, and price1
is not 0
.
How can I correctly create a new column using these conditions?
Now I have this:
df = df.withColumn("result", df["price1"]-df["price2"]/df["price1"])
我认为您可以这样操作:
df = df.withColumn("result", df["price1"]-df["price1"]/df["price2"]).fillna(0)
If you can use udf,
from pyspark.sql import functions as F
udf = F.udf(lambda x,y : 0 if x == 0 or not all((x,y)) else x-y/x)
df = df.withColumn("result", udf(df["price1"],df["price2"]))
df = df.withColumn("result",
when(df.col("price1").isNull OR df.col("price2").isNull OR df.col("price1")==0,0)
.otherwise(df.col("price1")-df.col("price2")/df.col("price1")))
This is how it can be done using scala..
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.