简体   繁体   中英

HiveQL how logic filter statement treats NULL values

I have a sample dataset such as following:

Id  Name            ReferredBy
1   John Doe        NULL
2   Jane Smith      NULL
3   Anne Jenkins        2
4   Eric Branford       NULL
5   Pat Richards        1
6   Alice Barnes        2

If I want to select all recorded not referred by Jane Smith I would use the following command:

SELECT Name FROM Customers WHERE ReferredBy <> 2;

On SQL Server, this will exclude NULL values so I need to write it in the following way:

SELECT Name FROM Customers WHERE ReferredBy IS NULL OR ReferredBy <> 2

Does HiveQL have the same issue?

*It is hard to test it out on the raw dataset I have since it is quiet large with very few missings.

Thanks!

The behavior of NULL is defined by SQL and all databases respect it. That said, the standard also specifies NULL safe comparison operators, IS NOT DISTINCT FROM and IS DISTINCT FROM . Hive supports one for equality, but not that one.

For your logic, you can use this Hive extension for <=> :

where not (ReferredBy <=> 2)

The <=> is the NULL -safe comparison, so it returns "true" for NULL <=> NULL and "false" for NULL <=> 2 , instead of NULL in both cases. This is presumably borrowed from MySQL.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM