简体   繁体   English

Hadoop SQL 中时间戳数据类型的奇怪行为

[英]Strange behaviour on timestamp data type in Hadoop SQL

I am trying to get the records for the condition as below in a Hadoop database:我正在尝试在 Hadoop 数据库中获取以下条件的记录:

select
CUSTOMER_SITE_NBR as account_site_nbr,
SITE_USE_ID as account_site_use_id,
CREATION_DATE_TIME as create_date_time,
LAST_UPDATE_DATE_TIME as main_source_last_update_date_time
from hub_customer.dim_site_use_mdm  
where cast (CREATION_DATE_TIME as date)  
BETWEEN '2020-02-01' and '2020-02-29' and  cast (LAST_UPDATE_DATE_TIME as date) = '2020-02-28' and
 site_use_code <> 'HEADQUARTER' order by
account_site_nbr,
account_site_use_id,
create_date_time,
main_source_last_update_date_time;

The records returned as below:返回的记录如下:

在此处输入图像描述

As you see, the main_source_last_update_date_time column returns all the time part in timestamps as 00:00:00.如您所见,main_source_last_update_date_time 列将时间戳中的所有时间部分返回为 00:00:00。 The data in our database rarely has 00:00:00 in timestamp.我们数据库中的数据很少有 00:00:00 的时间戳。

I tested for another two cases:我测试了另外两种情况:

Case 1: This gave incorrect result案例1:这给出了不正确的结果

select
CUSTOMER_SITE_NBR as account_site_nbr,
SITE_USE_ID as account_site_use_id,
CREATION_DATE_TIME as create_date_time,
LAST_UPDATE_DATE_TIME as main_source_last_update_date_time
from hub_customer.dim_site_use_mdm  
where cast (CREATION_DATE_TIME as date)  
BETWEEN '2020-02-01' and '2020-02-29' and  cast (LAST_UPDATE_DATE_TIME as date) = '2020-02-28' and
 site_use_code <> 'HEADQUARTER' AND SITE_USE_ID = '100000010853754' order by
account_site_nbr,
account_site_use_id,
create_date_time,
main_source_last_update_date_time;

在此处输入图像描述

Case 2:案例二:

select
CUSTOMER_SITE_NBR as account_site_nbr,
SITE_USE_ID as account_site_use_id,
CREATION_DATE_TIME as create_date_time,
LAST_UPDATE_DATE_TIME as main_source_last_update_date_time
from hub_customer.dim_site_use_mdm  where
SITE_USE_ID = '100000010853754'

在此处输入图像描述

The correct data is in second case.正确的数据是第二种情况。 There was no CAST in the SELECT statements. SELECT 语句中没有 CAST。 It seems like the main_source_last_update_date_time column got converted to DATE and then being converted back to timestamp - therefore, it might gave the 00:00:00 in the record.似乎 main_source_last_update_date_time 列已转换为 DATE,然后又转换回时间戳 - 因此,它可能会在记录中给出 00:00:00。 The issue occurs only this table as we have other tables with similar SQL queries and they provided corrected results.该问题仅出现在此表中,因为我们有其他表具有类似的 SQL 查询,并且它们提供了更正的结果。

How can find the cause of this issue and what is the correct approach to fix this?如何找到此问题的原因以及解决此问题的正确方法是什么?

Kind regards,亲切的问候,

Have you tried casting LAST_UPDATE_DATE_TIME as a timestamp (in the select)您是否尝试过将 LAST_UPDATE_DATE_TIME 转换为时间戳(在选择中)

select
  CUSTOMER_SITE_NBR as account_site_nbr,
  SITE_USE_ID as account_site_use_id,
  CREATION_DATE_TIME as create_date_time,
  LAST_UPDATE_DATE_TIME as main_source_last_update_date_time 
from 
  hub_customer.dim_site_use_mdm  
where 
  to_date(CREATION_DATE_TIME) > '2020-02-01' and '2020-02-29' 
and  
  to_date(LAST_UPDATE_DATE_TIME) = '2020-02-28' 
and
  site_use_code <> 'HEADQUARTER' 
order by
  account_site_nbr,
  account_site_use_id,
  create_date_time,
  main_source_last_update_date_time;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM