简体   繁体   中英

Sqoop incremental loading into partitioned hive table

How to load incremental data into a partitioned hive table

I have table "users" with the following columns, I have created hive partition based on created_on field

id bigint,
name string,
created_on string(yyyy-MM-dd),
updated_on string

I have created a sqoop job to import incrementally based on last modified date

sqoop job --create users -- import --connect jdbc:mysql://<ip>/product  --driver com.mysql.jdbc.Driver --username <> -P --table users --incremental lastmodified --check-column updated_on --last-value "2016-11-15"--hive-table users --hive-import --hive-partition-key created_on --hive-partition-value "2016-11-15" --m 1

If you observe the above job, this will fetch based on last modified value and insert into the wrong partition

Is there any work around for this issue

You load in a partition on 1 column, and expect to write based on a different column? That simply 'does not match'.

The solution appears to be to make the load and partitions line up.

So if you want to write all records with created_on equal to 2016-11-15, then also make sure to load in exactly those records. (I suppose you should not use the standard incremental functionality in this case).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM