如何使用Sqoop从MySQL增量导入到Hive？

Question

I can successfully do an incremental import from MySQL to HDFS using Sqoop by 我可以使用Sqoop通过以下方式成功地从MySQL增量导入HDFS

sqoop job -create JOBNAME ... --incremental append --check-column id --last-value LAST
sqoop job -exec JOBNAME

That finishes with log messages like 最后以类似的日志消息

INFO tool.ImportTool: Saving incremental import state to the metastore
INFO tool.ImportTool: Updated data for job: JOBNAME

And inspecting the job reveals that incremental.last.value was updated correctly. 并且检查该作业显示增量.last.value已正确更新。

If I attempt the same procedure, but add "--hive-import" to the definition of my job, it will execute successfully, but won't update incremental.last.value. 如果我尝试相同的过程，但是在作业的定义中添加“ --hive-import”，它将成功执行，但不会更新cremental.last.value。

Is this a bug? 这是错误吗？ Intended behavior? 预期的行为？ Does anyone have a procedure for incrementally importing data from MySQL and making it available via Hive? 有没有人有从MySQL增量导入数据并使其通过Hive可用的过程？

I basically want my Hadoop cluster to be a read slave of my MySQL database, for fast analysis. 我基本上希望Hadoop集群成为MySQL数据库的读取从属服务器，以便进行快速分析。 If there's some other solution than Hive (Pig would be fine), I'd love to hear that too. 如果除Hive之外还有其他解决方案（猪也可以），我也很想听听。

Answer 1

The option --hive import is used to create defined structure of the table on HDFS using mapreduce jobs.Moreover,the data being read on to Hive is Read Schema!!! --hive import选项用于使用mapreduce作业在HDFS上创建表的定义结构。此外，要读取到Hive的数据是Read Schema！ .Which means the data is not actually imported on to it unless the query is executed.So everytime ,you run a file,the query is executed on the schema newly(freshly) on the table in Hive.So it doesnt store the last incremental value. 这意味着除非执行查询，否则数据实际上不会导入到数据上。因此，每次运行文件时，查询都是在Hive的表上新（新近）在架构上新执行的，因此它不存储最后一个增量值。

Every query on the Hive schema is treated to be independent as it is run at execution time and doesnt store old results. Hive模式上的每个查询都被视为独立查询，因为它在执行时运行，并且不存储旧结果。

Answer 2

您也可以手动创建外部配置单元表，因为这只是一次活动，因此可以继续导入增量数据。

Answer 3

我们可以使用以下脚本获取最后一个值。

--check_colum colname=id -- incremental append or lastmodified --last_value $(HIVE_HOME /bin/hive -e'select max(id) from tablename')

如何使用Sqoop从MySQL增量导入到Hive？

问题描述

3 个解决方案

解决方案1
0 2014-08-11 20:45:40

解决方案2
0 2014-08-12 08:09:30

解决方案3
0 2016-03-22 06:35:07

如何使用Sqoop从MySQL增量导入到Hive？

问题描述

3 个解决方案

解决方案1 0 2014-08-11 20:45:40

解决方案2 0 2014-08-12 08:09:30

解决方案3 0 2016-03-22 06:35:07

解决方案1
0 2014-08-11 20:45:40

解决方案2
0 2014-08-12 08:09:30

解决方案3
0 2016-03-22 06:35:07