简体   繁体   中英

How to replace null values in the output of a left join operation with 0 in pyspark dataframe?

I have a simple PySpark dataframe, df1-

df1 = spark.createDataFrame([
    ("u1", 1),
    ("u1", 2),
    ("u2", 3),
    ("u3", 4),

    ],
    ['user_id', 'var1'])

print(df1.printSchema())
df1.show(truncate=False)

Output-

root
 |-- user_id: string (nullable = true)
 |-- var1: long (nullable = true)

None
+-------+----+
|user_id|var1|
+-------+----+
|u1     |1   |
|u1     |2   |
|u2     |3   |
|u3     |4   |
+-------+----+

I have another PySpark dataframe df2-

df2 = spark.createDataFrame([
    (1, 'f1'),
    (2, 'f2'),

    ],
    ['var1', 'var2'])

print(df2.printSchema())
df2.show(truncate=False)

Output-

root
 |-- var1: long (nullable = true)
 |-- var2: string (nullable = true)

None
+----+----+
|var1|var2|
+----+----+
|1   |f1  |
|2   |f2  |
+----+----+

I have to join the two dataframes mentioned above, by using a left-join operation on them-

df1.join(df2, df1.var1==df2.var1, 'left').show()

Output-

+-------+----+----+----+
|user_id|var1|var1|var2|
+-------+----+----+----+
|     u1|   1|   1|  f1|
|     u1|   2|   2|  f2|
|     u2|   3|null|null|
|     u3|   4|null|null|
+-------+----+----+----+

But as you can see, I am getting null values in the rows for which there two tables don't have a match. How can I replace all the null values with 0?

You can use fillna . Two fillnas are needed to account for integer and string columns.

df1.join(df2, df1.var1==df2.var1, 'left').fillna(0).fillna("0")

You can rename columns after join (otherwise you get columns with the same name) and use a dictionary to specify how you want to fill missing values:

f1.join(df2, df1.var1 == df2.var1, 'left').select(
    *[df1['user_id'], df1['var1'], df2['var1'].alias('df2_var1'), df2['var2'].alias('df2_var2')]
).fillna({'df2_var1': 0, 'df2_var2': '0'}).show()

Output:

+-------+----+--------+--------+
|user_id|var1|df2_var1|df2_var2|
+-------+----+--------+--------+
|     u1|   1|       1|      f1|
|     u2|   3|       0|       0|
|     u1|   2|       2|      f2|
|     u3|   4|       0|       0|
+-------+----+--------+--------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM