简体   繁体   English

HiveQL 逻辑过滤器语句如何处理 NULL 值

[英]HiveQL how logic filter statement treats NULL values

I have a sample dataset such as following:我有一个示例数据集,如下所示:

Id  Name            ReferredBy
1   John Doe        NULL
2   Jane Smith      NULL
3   Anne Jenkins        2
4   Eric Branford       NULL
5   Pat Richards        1
6   Alice Barnes        2

If I want to select all recorded not referred by Jane Smith I would use the following command:如果我想 select 所有记录都不是由 Jane Smith 引用的,我将使用以下命令:

SELECT Name FROM Customers WHERE ReferredBy <> 2;

On SQL Server, this will exclude NULL values so I need to write it in the following way:在 SQL 服务器上,这将排除 NULL 值,因此我需要按以下方式编写:

SELECT Name FROM Customers WHERE ReferredBy IS NULL OR ReferredBy <> 2

Does HiveQL have the same issue? HiveQL 有同样的问题吗?

*It is hard to test it out on the raw dataset I have since it is quiet large with very few missings. *很难在我拥有的原始数据集上对其进行测试,因为它非常大,几乎没有缺失。

Thanks!谢谢!

The behavior of NULL is defined by SQL and all databases respect it. NULL 的行为由NULL定义,所有数据库都尊重它。 That said, the standard also specifies NULL safe comparison operators, IS NOT DISTINCT FROM and IS DISTINCT FROM .也就是说,该标准还指定NULL安全比较运算符IS NOT DISTINCT FROMIS DISTINCT FROM Hive supports one for equality, but not that one. Hive 支持一个相等,但不支持那个。

For your logic, you can use this Hive extension for <=> :对于您的逻辑,您可以将此 Hive 扩展用于<=>

where not (ReferredBy <=> 2)

The <=> is the NULL -safe comparison, so it returns "true" for NULL <=> NULL and "false" for NULL <=> 2 , instead of NULL in both cases. The <=> is the NULL comparison, so it returns "true" for NULL <=> NULL and "false" for NULL <=> 2 , instead of NULL in both cases. This is presumably borrowed from MySQL.这大概是从 MySQL 借来的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM