How to load incremental data into a partitioned hive table
I have table "users" with the following columns, I have created hive partition based on created_on field
id bigint,
name string,
created_on string(yyyy-MM-dd),
updated_on string
I have created a sqoop job to import incrementally based on last modified date
sqoop job --create users -- import --connect jdbc:mysql://<ip>/product --driver com.mysql.jdbc.Driver --username <> -P --table users --incremental lastmodified --check-column updated_on --last-value "2016-11-15"--hive-table users --hive-import --hive-partition-key created_on --hive-partition-value "2016-11-15" --m 1
If you observe the above job, this will fetch based on last modified value and insert into the wrong partition
Is there any work around for this issue
You load in a partition on 1 column, and expect to write based on a different column? That simply 'does not match'.
The solution appears to be to make the load and partitions line up.
So if you want to write all records with created_on equal to 2016-11-15, then also make sure to load in exactly those records. (I suppose you should not use the standard incremental functionality in this case).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.