简体   繁体   English

如何将mysql表同步到配置单元表? (不支持sqoop --incremental lastmodified配置单元导入)

[英]How do I sync a mysql table to a hive table ? (sqoop --incremental lastmodified hive imports is not supported)

I want to sync a mysql table into hive table. 我想将mysql表同步到配置单元表中。 Because records in orders table usually changed in nearly future . 因为orders表中的记录通常会在不久的将来发生变化。 I need update them into hive . 我需要将它们更新为蜂巢。

For example , 例如 ,

  1. I dump all mysql data into hive 我将所有mysql数据转储到配置单元中
  2. daily job check the changed record which time_update is in nearly 1 days, and update them into hive table. 日常工作检查将近1天的time_update更改记录,并将其更新到配置单元表中。

I have tried --incremental lastmodified like below 我已经尝试过- --incremental lastmodified如下

sqoop import \
"-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \
--connect $DB_URL \
--username $USERNAME \
--password $PASSWORD \
--direct \
--fields-terminated-by '\t' \
--target-dir '/data/hive/' \
--delete-target-dir \
--hive-database $HIVE_DB \
--hive-table $HIVE_TABLE \
--hive-import \
--hive-overwrite \
--create-hive-table \
--query 'select * from '$HIVE_TABLE' where $CONDITIONS' \
--split-by id \
-m 6 \
--merge-key id \
--incremental lastmodified \
--check-column time_update \
--last-value "2019-01-01 21:00:00"

Got error --incremental lastmodified option for hive imports is not supported. Please remove the parameter --incremental lastmodified. 出现错误- --incremental lastmodified option for hive imports is not supported. Please remove the parameter --incremental lastmodified. --incremental lastmodified option for hive imports is not supported. Please remove the parameter --incremental lastmodified.

What is the proper way to do without --incremental lastmodified option . 没有--incremental lastmodified option的正确方法是什么?

First, you have to remove --delete-target-dir and --create-hive-table arguments as in incremental import, the target dir will stay as it is so --delete-target-dir will not work with --incremental argument. 首先,您必须像在增量导入中一样删除--delete-target-dir--create-hive-table参数,目标目录将保持原样,因此--delete-target-dir无法与--incremental一起使用论点。 Also, hive-table should be created once only so you have to remove --create-hive-table argument and create hive table manually in hive with same schema, take the location of that schema and use it as --target-dir. 此外,配置单元表只能创建一次,因此您必须删除--create-hive-table参数,并在具有相同模式的配置单元中手动创建配置单元表,获取该模式的位置并将其用作--target-dir。

sqoop import \
--connect <<db_url>> \
--username <<username>> \
--password <<password>> \
--direct \
--fields-terminated-by '\t' \
--hive-database <<hive_db>> \
--hive-table <<hive_table>> \
--hive-import \
--hive-overwrite \
--query 'select * from <<db_table>> where $CONDITIONS' \
--split-by product_id \
-m 6 \
--merge-key product_id \
--incremental lastmodified \
--check-column timedate \
--last-value 0 \
--target-dir /user/hive/warehouse/problem5.db/products_hive (<<hive_table_location>>)

This will work successfully, if not let me know. 如果不告诉我,它将成功运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM