[英]How to incremental import from MySQL to Hive using Sqoop?
I can successfully do an incremental import from MySQL to HDFS using Sqoop by 我可以使用Sqoop通过以下方式成功地从MySQL增量导入HDFS
sqoop job -create JOBNAME ... --incremental append --check-column id --last-value LAST
sqoop job -exec JOBNAME
That finishes with log messages like 最后以类似的日志消息
INFO tool.ImportTool: Saving incremental import state to the metastore
INFO tool.ImportTool: Updated data for job: JOBNAME
And inspecting the job reveals that incremental.last.value was updated correctly. 并且检查该作业显示增量.last.value已正确更新。
If I attempt the same procedure, but add "--hive-import" to the definition of my job, it will execute successfully, but won't update incremental.last.value. 如果我尝试相同的过程,但是在作业的定义中添加“ --hive-import”,它将成功执行,但不会更新cremental.last.value。
Is this a bug? 这是错误吗? Intended behavior?
预期的行为? Does anyone have a procedure for incrementally importing data from MySQL and making it available via Hive?
有没有人有从MySQL增量导入数据并使其通过Hive可用的过程?
I basically want my Hadoop cluster to be a read slave of my MySQL database, for fast analysis. 我基本上希望Hadoop集群成为MySQL数据库的读取从属服务器,以便进行快速分析。 If there's some other solution than Hive (Pig would be fine), I'd love to hear that too.
如果除Hive之外还有其他解决方案(猪也可以),我也很想听听。
The option --hive import is used to create defined structure of the table on HDFS using mapreduce jobs.Moreover,the data being read on to Hive is Read Schema!!! --hive import选项用于使用mapreduce作业在HDFS上创建表的定义结构。此外,要读取到Hive的数据是Read Schema! .Which means the data is not actually imported on to it unless the query is executed.So everytime ,you run a file,the query is executed on the schema newly(freshly) on the table in Hive.So it doesnt store the last incremental value.
这意味着除非执行查询,否则数据实际上不会导入到数据上。因此,每次运行文件时,查询都是在Hive的表上新(新近)在架构上新执行的,因此它不存储最后一个增量值。
Every query on the Hive schema is treated to be independent as it is run at execution time and doesnt store old results. Hive模式上的每个查询都被视为独立查询,因为它在执行时运行,并且不存储旧结果。
您也可以手动创建外部配置单元表,因为这只是一次活动,因此可以继续导入增量数据。
我们可以使用以下脚本获取最后一个值。
--check_colum colname=id -- incremental append or lastmodified --last_value $(HIVE_HOME /bin/hive -e'select max(id) from tablename')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.