[英]Azure Data Factory v2 - wrong year copying from parquet to SQL DB
I'm having a weird issue with Azure Data Factory v2. 我在Azure Data Factory v2中遇到了一个奇怪的问题。 There's a Spark Job which is running and producing parquet files as output, an ADFv2 copy activity then takes the output parquet and copies the data into an Azure SQL Database.
有一个Spark Job正在运行并生成实木复合地板文件作为输出,然后ADFv2复制活动将输出实木复合地板并将数据复制到Azure SQL数据库中。 All is working fine except for dates!
除日期外,其他一切正常! When the data lands in SQL the year is 1969 years out.
当数据放入SQL中时,年份是1969年。 So todays date (2018-11-22) would land as 3987-11-22.
所以今天的日期(2018-11-22)将成为3987-11-22。
I've tried changing the source and destination types between Date, DateTime, DateTimeOffset and String but with no success. 我尝试在Date,DateTime,DateTimeOffset和String之间更改源和目标类型,但没有成功。 At the moment I'm correcting the dates in the database but this is not really ideal.
目前,我正在更正数据库中的日期,但这并不是很理想。
I've opened the source parquet files using Parquet Viewer, Spark and Python (desktop) and they all correctly show the year as 2018 我已经使用Parquet Viewer,Spark和Python(桌面)打开了源镶木地板文件,它们都正确显示了2018年
According to parquet date type definition, https://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types The date is stored as "the number of days from the Unix epoch, 1 January 1970 " 根据实木复合地板日期类型定义, https ://drill.apache.org/docs/parquet-format/#sql-types-to-parquet-logical-types日期存储为“距Unix时代的天数, 1970年1月1日 ”
And ADF is using .net type doing the transformation. ADF使用.net类型进行转换。 According to .net type definition, Time values are measured in 100-nanosecond units called ticks.
根据.net类型的定义,时间值以100纳秒单位(称为刻度)进行测量。 A particular date is the number of ticks since 12:00 midnight, January 1, 000 1 AD (CE) https://docs.microsoft.com/en-us/dotnet/api/system.datetime?view=netframework-4.7.2
一个特定的日期是自000年 1 月1日午夜12:00以来的滴答数,该时间是公元1年(CE) https://docs.microsoft.com/zh-cn/dotnet/api/system.datetime?view=netframework-4.7 0.2
Seems extra 1969 is added for this reason. 由于这个原因,似乎增加了1969。 But not sure whether is this a bug.
但不确定这是否是错误。 What is your parquet data type?
您的实木复合地板数据类型是什么? is it Date?
是日期吗? and what is the sql data type?
和什么是SQL数据类型? Could you provide the copy activity run id?
您能否提供复制活动运行ID? Or maybe some parquet sample data?
还是一些镶木地板样本数据?
Based on Parquet encoding definitions ,no Date, DateTime, DateTimeOffset and String
format exist,so you do not need to try with these formats. 根据Parquet编码定义 ,不存在
Date, DateTime, DateTimeOffset and String
格式,因此您无需尝试使用这些格式。
Based on this Data type mapping for Parquet files in Azure Data Factory: 基于Azure Data Factory中Parquet文件的此数据类型映射 :
The DateTimeOffset
format corresponds to Int96
,I suggest you trying this transmission on the source of parquet file. DateTimeOffset
格式对应于Int96
,我建议您在镶木地板文件的源上尝试这种传输。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.