简体   繁体   English

如何将数据从mysql导入Apache Hadoop HDFS安装。

[英]how to import data from mysql to Apache Hadoop HDFS installation.

How to import data from mysql to HDFS. 如何从mysql导入数据到HDFS。 I can't use sqoop as it's a HDFS installation not cloudera. 我不能使用sqoop,因为它是HDFS安装而非cloudera。 I used below link to setup HDFS. 我使用下面的链接来设置HDFS。 My hadoop version is 0.20.2 http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ 我的hadoop版本是0.20.2 http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

与您的问题没有直接关系,但是如果您想使用数据库作为Map Reduce作业的输入,并且不想复制到HDFS,则可以使用DBInputFormat直接从数据库输入。

Apart from sqoop, you could try hiho . 除了sqoop外,您还可以尝试hiho I have heard good things about it. 我听说过好事。 (Never used it though) (虽然从未使用过)

But mostly what I have seen is people end up writing their own flows to do this. 但是我所看到的大部分是人们最终写出自己的流程来做到这一点。 If hiho doesn;t work out, you can dump data from MySql using mysqlimport . 如果hiho无法解决问题,则可以使用mysqlimport从MySql转储数据。 Then load into HDFS using a map-reduce job or Pig/Hive. 然后使用map-reduce作业或Pig / Hive加载到HDFS中。

I have heard Sqoop is pretty good and is widely used (This is hearsay again, I have never used it myself). 我听说Sqoop相当不错并且被广泛使用(这又是传闻,我自己从未使用过)。 Now that it is an apache incubator project , I think it might have started supporting apache releases of hadoop, or at least might have made it less painful for non-cloudera versions. 现在这是一个apache孵化器项目 ,我认为它可能已经开始支持Hadoop的apache版本了,或者至少可以减轻非cloudera版本的痛苦。 The doc does say that it support Apache hadoop v0.21. 该文档确实说它支持Apache hadoop v0.21。 Try to make it work with your hadoop version. 尝试使其与您的hadoop版本一起使用。 It might not be that difficult. 可能没有那么困难。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM