简体   繁体   中英

How to create a new column in pyspark?

In my pyspark DataFrame I have two columns price1 and price2 . I want to create a new column result based on the formula ((price1 - price2)/price1) . However, I want also to check that neither price1 nor price2 are null, and price1 is not 0 .

How can I correctly create a new column using these conditions?

Now I have this:

df = df.withColumn("result", df["price1"]-df["price2"]/df["price1"])

我认为您可以这样操作:

df = df.withColumn("result", df["price1"]-df["price1"]/df["price2"]).fillna(0)

If you can use udf,

from pyspark.sql import functions as F

udf = F.udf(lambda x,y : 0 if x == 0 or not all((x,y)) else x-y/x)
df = df.withColumn("result", udf(df["price1"],df["price2"]))
df = df.withColumn("result", 
when(df.col("price1").isNull OR df.col("price2").isNull OR df.col("price1")==0,0)
.otherwise(df.col("price1")-df.col("price2")/df.col("price1")))

This is how it can be done using scala..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM