简体   繁体   English

Hive 表和数据块增量表之间的时间戳数据值不同

[英]Timestamp data value different between Hive tables and databricks delta tables

We have done binary copy of data from Hive to ADLS with checksum validated.我们已经完成了从 Hive 到 ADLS 的数据二进制副本,并验证了校验和。 While values across every datatype matches however timestamp datatype columns are showing change in value between Hive and Delta(Azure Databricks) tables.虽然每个数据类型的值都匹配,但时间戳数据类型列显示 Hive 和 Delta(Azure Databricks) 表之间的值变化。

select abcdtstmp from  xyz.abc where mn_ID = "sdsdsd-7878-0016" 
2018-01-16 00:00:00.0 (on prem)
select abcdtstmp from  xyz.abc where mn_ID = "sdsdsd-7878-0016" 
2018-01-16T05:00:00.000+0000(DBX)

While checksum and all validation does match, however some values getting added after 'T' is causing concern.虽然校验和和所有验证确实匹配,但是在“T”之后添加的一些值引起了关注。 Any suggestion would be helpful任何建议都会有所帮助

This seems to be related to timezone and hive.这似乎与timezone和hive有关。
Hive always thinks that timestamps in Parquet files are stored in UTC and it will convert them to a local system time (cluster host time) when it outputs. Hive一直认为Parquet文件中的时间戳是UTC格式的,输出的时候会转换成本地系统时间(集群主机时间)。 So, even if you are transferring data from EST to EST, its hive that is the culprit.因此,即使您将数据从 EST 传输到 EST,它的 hive 也是罪魁祸首。

You can follow this link if you have hive version higher than 1.2 - https://issues.apache.org/jira/browse/HIVE-9482 set hive.parquet.timestamp.skip.conversion=true Else, you need to manually convert the data back to EST or whatever timezone you want using below sql.如果您的 hive 版本高于 1.2,您可以点击此链接 - https://issues.apache.org/jira/browse/HIVE-9482设置hive.parquet.timestamp.skip.conversion=true否则,您需要手动转换数据返回 EST 或您想要使用的任何时区低于 sql。

from_utc_timestamp(to_utc_timestamp(my_dt_tm,'America/New_York'),'America/Denver') AS local_time

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 HBase 到增量表 - HBase to Delta Tables 如何使用 spring 启动建立 Microsoft azure databricks 增量表连接,就像 mysql,ZAC5C74B61AFFB4BAC28 服务器一样 - How to establish a Microsoft azure databricks delta tables connection using spring boot just like mysql,sql server 如何将 Databricks 表复制到新订阅中的另一个 Databricks 以保留历史记录(时间戳和版本)? - How can Databricks tables be copied retaining history (timestamp and version) to another Databricks in a new subscription? 删除 Databricks 中的表 - Delete tables in Databricks 使用 azure 数据工厂从增量表中获取数据到 blob - Using azure data factory to get data from delta tables to blob KQL 连接两个具有不同时间戳的表 - KQL join two tables with different TimeStamp LOAD DATA INPATH将相同的CSV基础数据加载到两个不同的外部Hive表中 - LOAD DATA INPATH loads same CSV-base data into two different and external Hive tables 我可以通过 Databricks 将数据摄取到 azure 数据资源管理器中的表中吗? - Can I ingest data into tables in azure data explorer through Databricks? 使用 ADF 将数据传输到 Databricks 表期间的数据类型转换为字符串 - Data type during transferring data with ADF to Databricks tables converts into string 从所有表的 azure databricks 数据库列中获取值 - get a value from azure databricks database column from all tables
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM