简体   繁体   中英

PySpark: Replace values of dataframe based on criteria

I have a dataframe as show below

+++++++++++++++++++++
colA | colB | colC |
+++++++++++++++++++++
123  | 3 | 0|
222  | 0 | 1|
200  | 0 | 2|

I want to replace the values in colB and colC with a value of 1 if they are greater than 0.

I am able to use the na.fill function if I need to fill nulls with 0. But I am not sure how to do this.

Assuming your dataframe is df, then you can do the following:

from pyspark.sql.functions import when  

df = df.select('colA', 
                   when(df.colB > 0, 1).alias('colB'),
                   when(df.colB > 0, 1).alias('colC'))

This checks whether colB and colC are greater than 0 and assign 1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM