简体   繁体   中英

How to create a new column based on calculations made in other columns in PySpark

I have a following DataFrame:

+-----------+----------+----------+
|   some_id | one_col  | other_col|
+-----------+----------+----------+
|       xx1 |        11|       177|         
|       xx2 |      1613|      2000|    
|       xx4 |         0|     12473|      
+-----------+----------+----------+

I need to add a new column which is based on some calculations done on the first and second column, namely, for example, for col1_value=1 and col2_value=10 would need to produce a percentage of col1 that is included in col2, so col3_value= (1/10)*100=10%:

+-----------+----------+----------+--------------+
|   some_id | one_col  | other_col|  percentage  |
+-----------+----------+----------+--------------+
|       xx1 |        11|       177|     6.2      |  
|       xx3 |         1|       10 |      10      |     
|       xx2 |      1613|      2000|     80.6     |
|       xx4 |         0|     12473|      0       |
+-----------+----------+----------+--------------+

I know I would need to use a udf for this, but how do I directly add a new column value based on the outcome?

Some pseudo-code:

import pyspark
from pyspark.sql.functions import udf

df = load_my_df

def my_udf(val1, val2):
    return (val1/val2)*100

udf_percentage = udf(my_udf, FloatType())

df = df.withColumn('percentage', udf_percentage(# how?))

Thank you!

df.withColumn('percentage', udf_percentage("one_col", "other_col"))

or

df.withColumn('percentage', udf_percentage(df["one_col"], df["other_col"]))

or

df.withColumn('percentage', udf_percentage(df.one_col, df.other_col))

or

from pyspark.sql.functions import col

df.withColumn('percentage', udf_percentage(col("one_col"), col("other_col")))

but why not just:

df.withColumn('percentage', col("one_col") / col("other_col") * 100)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM