简体   繁体   中英

Best way to import data from MySql to HDFS

I need to know is there any way to import data from mysql to HDFS, there are some conditions I need to mention.

  • I know hbase,hive and sqoop can help me , but I dont wan't any extra layers. Just mapreduce and hadoop java api.
  • I also need to update HDFS as data is updated in mySQL.

I need to know best way to import mysql data into HDFS and update in real time.

你为什么不想使用sqoop - 它做了你必须做的事情(打开JDBC连接获取数据,写入hadoop)从hadoop world 09看这个演示文稿

You can use Real Time import using CDC and Talend. http://www.talend.com/talend-big-data-sandbox

Yes, you can access the database and HDFS via JDBC connectors and hadoop Java API.

But in map-reduce things will be out of your control when accessing a database.

  • Each mapper/reducer tries to establish a separate connection to database, eventually impacts the database performance.
  • There won't be any clue which mapper/reducer executes what portion of the query result set.
  • Incase if there is a single mapper/reducer to access the database then hadoop parallelism will be lost.
  • Fault tolerant mechanism has to be implemented if any of the mapper/reducer is failed.
  • list goes on......

To overcome all these hurdles, Sqoop was developed to transfer data between RDBMS to/from HDFS.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM