[英]Date variable is NULL while loading csv data into hive External table
I am trying to load a SAS Dataset to Hive external table. 我正在尝试将SAS数据集加载到Hive外部表。 For that, I have converted SAS dataset into CSV file format first.
为此,我首先将SAS数据集转换为CSV文件格式。 In sas dataset, Date variable (ie as_of_dt) contents shows this:
LENGTH=8 , FORMAT= DATE9. , INFORMAT=DATE9. , LABLE=as_of_dt
在sas数据集中, Date变量(即as_of_dt)的内容显示如下:
LENGTH=8 , FORMAT= DATE9. , INFORMAT=DATE9. , LABLE=as_of_dt
LENGTH=8 , FORMAT= DATE9. , INFORMAT=DATE9. , LABLE=as_of_dt
LENGTH=8 , FORMAT= DATE9. , INFORMAT=DATE9. , LABLE=as_of_dt
And for converting SAS into CSV, I have used below code patch (i have used 'retain' statement before in sas so that the order of variables are maintained): LENGTH=8 , FORMAT= DATE9. , INFORMAT=DATE9. , LABLE=as_of_dt
为了将SAS转换为CSV,我使用了以下代码补丁(我以前在sas中使用过'retain'语句,以便保持变量的顺序):
proc export data=input_SASdataset_for_csv_conv
outfile= "/mdl/myData/final_merged_table_201501.csv"
dbms=csv
replace;
putnames=no;
run;
Till here (ie till csv file creation), the Date variable is read correctly. 到这里为止(即直到创建csv文件),正确读取Date变量。 But after this, when I am loading it into Hive External Table by using below command in HIVE, then the DATE variable (ie as_of_dt) is getting assigned as NULL :
但是之后,当我通过在HIVE中使用以下命令将其加载到Hive外部表中时,则DATE变量(即as_of_dt)被分配为NULL :
CREATE EXTERNAL TABLE final_merged_table_20151(as_of_dt DATE, client_cm_id STRING, cm11 BIGINT, cm_id BIGINT, corp_id BIGINT, iclic_id STRING, mkt_segment_cd STRING, product_type_cd STRING, rated_company_id STRING, recovery_amt DOUBLE, total_bal_amt DOUBLE, write_off_amt DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/mdl/myData';
Also, when i am using this command in hive desc formatted final_merged_table_201501
, then I am getting following table parameters: 另外,当我在蜂巢
desc formatted final_merged_table_201501
蜂巢中使用此命令时,我将获得以下表参数:
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
numFiles 0
numRows -1
rawDataSize -1
totalSize 0
transient_lastDdlTime 1447151851
But even though it shows numRows=-1 , still I am able to see data inside the table, by using hive command SELECT * FROM final_merged_table_20151 limit 10;
但是,即使它显示numRows = -1 ,我仍然可以通过使用配置单元命令
SELECT * FROM final_merged_table_20151 limit 10;
在表中查看数据SELECT * FROM final_merged_table_20151 limit 10;
, with Date variable (as_of_dt) stored as NULL. ,其中Date变量(as_of_dt)存储为NULL。 Where might be the problem?
问题可能出在哪里?
Based on madhu's comment you need to change the format on as_of_dt to yymmdd10. 根据madhu的评论,您需要将as_of_dt的格式更改为yymmdd10。
You can do that with PROC DATASETS. 您可以使用PROC DATASETS做到这一点。 Here is an example:
这是一个例子:
data test;
/*Test data with AS_OF_DT formatted date9. per your question*/
format as_of_dt date9.;
do as_of_dt=today() to today()+5;
output;
end;
run;
proc datasets lib=work nolist;
/*Modify Test Data Set and set format for AS_OF_DT variable*/
modify test;
attrib as_of_dt format=yymmdd10.;
run;
quit;
/*Create CSV*/
proc export file="C:\temp\test.csv"
data=test
dbms=csv
replace;
putnames=no;
run;
If you open the CSV, you will see the date in YYYY-MM-DD format. 如果打开CSV,则将以YYYY-MM-DD格式显示日期。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.