I have a dataframe as show below
+++++++++++++++++++++
colA | colB | colC |
+++++++++++++++++++++
123 | 3 | 0|
222 | 0 | 1|
200 | 0 | 2|
I want to replace the values in colB
and colC
with a value of 1 if they are greater than 0.
I am able to use the na.fill function if I need to fill nulls with 0. But I am not sure how to do this.
Assuming your dataframe is df, then you can do the following:
from pyspark.sql.functions import when
df = df.select('colA',
when(df.colB > 0, 1).alias('colB'),
when(df.colB > 0, 1).alias('colC'))
This checks whether colB and colC are greater than 0 and assign 1.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.