简体   繁体   中英

Merge multiple delimited file to Hive using ODI

Basically I have 3 input file and I need to merge the 3 input file to Hive using ODI.

file 1: AcctNo,Name,Address

file 2: AcctNo,Block_Code,Block_Date

file 3: AcctNo,Balance1,Balance2

Hive: AcctNo,Name,Address,Block_Code,Block_Date,Balance1,Balance2

I'm pretty new to Hadoop and I been throw into this project without a proper training with Oracle Data Integration, I read that hive natively does not support update but can be enable ACID transaction.

Since my organisation using ODI as main data integration tools, I need it to be done in ODI, can anyone enlighten me can this to be done in 1 time ETL?

I was thinking to do a ETL from file to Hive first without merge, then only do a merge within HIVE, bit it seem time consuming. I wonder is there any better way to do that?

In my option you could load the files into HDFS either using ODI or FTPS. Once the data is in HDFS you have the flexibility to create external tables which would simply create a logical table and then you could join/merge them into a separate table.

  • Loading the files into HDFS using hadoop CLI:

hadoop fs -copyFromLocal file1.csv /user/cloudera/data/file1

/user/cloudera/data is the HDFS path you need to provide as per your project.

  • Creating External Table in hive:

     create external table file1_table( AcctNo string, Name string, Address string ) row format delimited fields terminated by ',' -- assuming your file is comma separated stored as textfile location '/user/cloudera/data/file1' -- the hdfs location tblproperties("skip.header.line.count"="1"); -- assuming file is having a header section.
  • Once you have the external tables ready, you could create a table which you join all of the tables (file1_table, file2_table, file3_table) together into a single view.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM