[英]HiveQL how logic filter statement treats NULL values
I have a sample dataset such as following:我有一个示例数据集,如下所示:
Id Name ReferredBy
1 John Doe NULL
2 Jane Smith NULL
3 Anne Jenkins 2
4 Eric Branford NULL
5 Pat Richards 1
6 Alice Barnes 2
If I want to select all recorded not referred by Jane Smith I would use the following command:如果我想 select 所有记录都不是由 Jane Smith 引用的,我将使用以下命令:
SELECT Name FROM Customers WHERE ReferredBy <> 2;
On SQL Server, this will exclude NULL values so I need to write it in the following way:在 SQL 服务器上,这将排除 NULL 值,因此我需要按以下方式编写:
SELECT Name FROM Customers WHERE ReferredBy IS NULL OR ReferredBy <> 2
Does HiveQL have the same issue? HiveQL 有同样的问题吗?
*It is hard to test it out on the raw dataset I have since it is quiet large with very few missings. *很难在我拥有的原始数据集上对其进行测试,因为它非常大,几乎没有缺失。
Thanks!谢谢!
The behavior of NULL
is defined by SQL and all databases respect it. NULL 的行为由NULL
定义,所有数据库都尊重它。 That said, the standard also specifies NULL
safe comparison operators, IS NOT DISTINCT FROM
and IS DISTINCT FROM
.也就是说,该标准还指定NULL
安全比较运算符IS NOT DISTINCT FROM
和IS DISTINCT FROM
。 Hive supports one for equality, but not that one. Hive 支持一个相等,但不支持那个。
For your logic, you can use this Hive extension for <=>
:对于您的逻辑,您可以将此 Hive 扩展用于<=>
:
where not (ReferredBy <=> 2)
The <=>
is the NULL
-safe comparison, so it returns "true" for NULL <=> NULL
and "false" for NULL <=> 2
, instead of NULL
in both cases. The <=>
is the NULL
comparison, so it returns "true" for NULL <=> NULL
and "false" for NULL <=> 2
, instead of NULL
in both cases. This is presumably borrowed from MySQL.这大概是从 MySQL 借来的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.