[英]How to replace null values in the output of a left join operation with 0 in pyspark dataframe?
I have a simple PySpark dataframe, df1-我有一个简单的 PySpark dataframe, df1-
df1 = spark.createDataFrame([
("u1", 1),
("u1", 2),
("u2", 3),
("u3", 4),
],
['user_id', 'var1'])
print(df1.printSchema())
df1.show(truncate=False)
Output-输出-
root
|-- user_id: string (nullable = true)
|-- var1: long (nullable = true)
None
+-------+----+
|user_id|var1|
+-------+----+
|u1 |1 |
|u1 |2 |
|u2 |3 |
|u3 |4 |
+-------+----+
I have another PySpark dataframe df2-我还有另一个 PySpark dataframe df2-
df2 = spark.createDataFrame([
(1, 'f1'),
(2, 'f2'),
],
['var1', 'var2'])
print(df2.printSchema())
df2.show(truncate=False)
Output-输出-
root
|-- var1: long (nullable = true)
|-- var2: string (nullable = true)
None
+----+----+
|var1|var2|
+----+----+
|1 |f1 |
|2 |f2 |
+----+----+
I have to join the two dataframes mentioned above, by using a left-join operation on them-我必须通过对它们使用左连接操作来连接上面提到的两个数据框-
df1.join(df2, df1.var1==df2.var1, 'left').show()
Output-输出-
+-------+----+----+----+
|user_id|var1|var1|var2|
+-------+----+----+----+
| u1| 1| 1| f1|
| u1| 2| 2| f2|
| u2| 3|null|null|
| u3| 4|null|null|
+-------+----+----+----+
But as you can see, I am getting null values in the rows for which there two tables don't have a match.但正如您所看到的,我在两个表不匹配的行中得到 null 值。 How can I replace all the null values with 0?
如何将所有 null 值替换为 0?
You can use fillna
.您可以使用
fillna
。 Two fillnas are needed to account for integer and string columns.需要两个 fillnas 来说明 integer 和字符串列。
df1.join(df2, df1.var1==df2.var1, 'left').fillna(0).fillna("0")
You can rename columns after join
(otherwise you get columns with the same name) and use a dictionary to specify how you want to fill missing values:您可以在
join
后重命名列(否则您将获得具有相同名称的列)并使用字典来指定您希望如何填充缺失值:
f1.join(df2, df1.var1 == df2.var1, 'left').select(
*[df1['user_id'], df1['var1'], df2['var1'].alias('df2_var1'), df2['var2'].alias('df2_var2')]
).fillna({'df2_var1': 0, 'df2_var2': '0'}).show()
Output: Output:
+-------+----+--------+--------+
|user_id|var1|df2_var1|df2_var2|
+-------+----+--------+--------+
| u1| 1| 1| f1|
| u2| 3| 0| 0|
| u1| 2| 2| f2|
| u3| 4| 0| 0|
+-------+----+--------+--------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.