简体   繁体   中英

Date variable is NULL while loading csv data into hive External table

I am trying to load a SAS Dataset to Hive external table. For that, I have converted SAS dataset into CSV file format first. In sas dataset, Date variable (ie as_of_dt) contents shows this: LENGTH=8 , FORMAT= DATE9. , INFORMAT=DATE9. , LABLE=as_of_dt LENGTH=8 , FORMAT= DATE9. , INFORMAT=DATE9. , LABLE=as_of_dt LENGTH=8 , FORMAT= DATE9. , INFORMAT=DATE9. , LABLE=as_of_dt And for converting SAS into CSV, I have used below code patch (i have used 'retain' statement before in sas so that the order of variables are maintained):

proc export data=input_SASdataset_for_csv_conv
        outfile=  "/mdl/myData/final_merged_table_201501.csv"
        dbms=csv
        replace;
        putnames=no;
run;

Till here (ie till csv file creation), the Date variable is read correctly. But after this, when I am loading it into Hive External Table by using below command in HIVE, then the DATE variable (ie as_of_dt) is getting assigned as NULL :

CREATE EXTERNAL TABLE final_merged_table_20151(as_of_dt DATE, client_cm_id STRING, cm11 BIGINT, cm_id BIGINT, corp_id BIGINT, iclic_id STRING, mkt_segment_cd STRING, product_type_cd STRING, rated_company_id STRING, recovery_amt DOUBLE, total_bal_amt DOUBLE, write_off_amt DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/mdl/myData';

Also, when i am using this command in hive desc formatted final_merged_table_201501 , then I am getting following table parameters:

Table Parameters:
    COLUMN_STATS_ACCURATE   false
    EXTERNAL                TRUE
    numFiles                0
    numRows                 -1
    rawDataSize             -1
    totalSize               0
    transient_lastDdlTime   1447151851

But even though it shows numRows=-1 , still I am able to see data inside the table, by using hive command SELECT * FROM final_merged_table_20151 limit 10; , with Date variable (as_of_dt) stored as NULL. Where might be the problem?

Based on madhu's comment you need to change the format on as_of_dt to yymmdd10.

You can do that with PROC DATASETS. Here is an example:

data test;
   /*Test data with AS_OF_DT formatted date9. per your question*/
   format as_of_dt date9.;
   do as_of_dt=today() to today()+5;
      output;
   end;
run;

proc datasets lib=work nolist;
/*Modify Test Data Set and set format for AS_OF_DT variable*/
   modify test;
     attrib as_of_dt format=yymmdd10.;
   run;
quit;

/*Create CSV*/
proc export file="C:\temp\test.csv"
            data=test
            dbms=csv
            replace;
        putnames=no;
run;

If you open the CSV, you will see the date in YYYY-MM-DD format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM