简体   繁体   中英

When condition in Pyspark dataframe

I have the below dataframe

Column_1     Column_2       Column_3
A            1
A            2
A            3
A            4
A            5
B            1
B            4
B            5
C            1
C            2

I have to populate Column_3 based on values in Column_1 and Column_2. If Column_1 in ('A','B') and Column_2 not in ('1','3','5') i have to populate column_3 with X else Y.

Expected Output:

Column_1     Column_2       Column_3
A            1              Y
A            2              X
A            3              Y
A            4              X
A            5              Y
B            1              Y            
B            4              X
B            5              Y
C            1              Y
C            2              Y

What I tried:

I tried with when and otherwise statement, but not sure how to use Not in along with the when statement. Any help on this would be highly appreciated

You can leverage isin and inverse ~ :

import pyspark.sql.functions as F

c = (F.when(df['Column_1'].isin(['A','B']) & 
      (~df['Column_2'].isin([1,3,5])),'X').otherwise('Y'))
df.withColumn("Column_3",c).show()

Or:

expr = """CASE 
        WHEN Column_1 IN ('A','B') and Column_2 NOT IN (1,3,5) 
        THEN 'X' ELSE 'Y' 
        END as Column_3"""
df.selectExpr("*",expr).show()

+--------+--------+--------+
|Column_1|Column_2|Column_3|
+--------+--------+--------+
|       A|       1|       Y|
|       A|       2|       X|
|       A|       3|       Y|
|       A|       4|       X|
|       A|       5|       Y|
|       B|       1|       Y|
|       B|       4|       X|
|       B|       5|       Y|
|       C|       1|       Y|
|       C|       2|       Y|
+--------+--------+--------+

More details:

df['Column_1'].isin(['A','B'])
#Column<b'(Column_1 IN (A, B))'>
~df['Column_2'].isin([1,3,5])
#Column<b'(NOT (Column_2 IN (1, 3, 5)))'>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM