简体   繁体   English

标量-2个数据框列上的外部联接不显示存在空值的行

[英]scala- Outer join on 2 dataframe columns doesnt show rows where there are null values

Im joining 2 dataframes like so: val joinCols = Array("first_name", "last_name") val df_subset_joined = df1_subset.as("a").join(df2_subset.as("b"), joinCols, "full_outer") df_subset_joined.show() 我像这样加入2个数据帧:val joinCols = Array(“ first_name”,“ last_name”)val df_subset_joined = df1_subset.as(“ a”)。join(df2_subset.as(“ b”),joinCols,“ full_outer”)df_subset_joined 。节目()

This is the result of the above code: 这是上面的代码的结果:

Dataframe of differences between 2 dataframes
+----------+---------+-------------+-------------+
|first_name|last_name|loyalty_score|loyalty_score|
+----------+---------+-------------+-------------+
|     will |    smith|           67|           67|
|   george |  clooney|           67|           67|
|   george |  clooney|           67|           88|
|    blake |   lively|           66|         null|
|    celena|    gomez|         null|            2|
|       eva|    green|           44|           56|
|      null|     null|             |         null|
|     jason|    momoa|           34|           34|
|        ed|  sheeran|           88|         null|
|    lionel|    messi|           88|           88|
|      kyle|   jenner|         null|           56|
|      tom |   cruise|           66|           34|
|      tom |   cruise|           66|           99|
|      brad|     pitt|           99|           78|
|      ryan| reynolds|           45|         null|
+----------+---------+-------------+-------------+

As you can see there are columns with null values. 如您所见,存在具有空值的列。

I run the following code next: 接下来我运行以下代码:

val filter_str = s"a.$col"+" != "+s"b.$col"
val df_subset_filtered = df_subset_joined.filter(filter_str)
df_subset_filtered.show()

I get the foll dataframe: 我得到以下数据框:

Below is the dataframe of differences between DF1 and DF1 based on the comparison between:
a.loyalty_score != b.loyalty_score
+----------+---------+-------------+-------------+
|first_name|last_name|loyalty_score|loyalty_score|
+----------+---------+-------------+-------------+
|      tom |   cruise|           66|           99|
|      tom |   cruise|           66|           34|
|       eva|    green|           44|           56|
|      brad|     pitt|           99|           78|
|   george |  clooney|           67|           88|
+----------+---------+-------------+-------------+

Why dont I see the rows where there are null values in 1 column and a actual value in another. 为什么我看不到在第一列中有空值而在另一列中有实际值的行。 Shouldnt this satisfy value != null 这不应该满足值!= null

How can I make my filter statement make the null values appear in the final dataframe 我如何使我的过滤器语句使空值出现在最终数据框中

The reason you don't get any rows where there is null in one column and non-null in the other is that the comparison returns FALSE . 之所以没有得到其中一列为null而另一列为non-null任何行,是因为比较返回FALSE

To avoid this, use the null-safe comparison operator <=> , in conjunction with not . 为了避免这种情况,请结合使用null安全比较运算符<=>not

val filter_str = "not(" + s"a.$col"+" <=> "+s"b.$col)" 
val df_subset_filtered = df_subset_joined.filter(filter_str)
df_subset_filtered.show()

From the documentation, 从文档中

expr1 <=> expr2 - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null. expr1 <=> expr2-对于非空操作数,返回与EQUAL(=)运算符相同的结果,但如果两者均为null,则返回true,如果其中之一为null,则返回false。

Arguments: 参数:

expr1, expr2 - the two expressions must be same type or can be casted to a common type, and must be a type that can be used in equality comparison. expr1,expr2-这两个表达式必须是相同类型或可以强制转换为通用类型,并且必须是可用于相等比较的类型。 Map type is not supported. 不支持地图类型。 For complex types such array/struct, the data types of fields must be orderable. 对于复杂的类型(例如数组/结构),字段的数据类型必须可排序。 Examples: 例子:

SELECT 2 <=> 2; 选择2 <=> 2; true 真正

SELECT 1 <=> '1'; SELECT 1 <=>'1'; true 真正

SELECT true <=> NULL; SELECT true <=> NULL; false

SELECT NULL <=> NULL; SELECT NULL <=> NULL; true 真正

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM