简体   繁体   中英

data from mutiple mysql tables to hadoop map-reduce

We have following scenario:

We have a chain of map-reduce processes implemented in java.Currently we are reading data from a mysql table and saving output to another mysql table .Now we may need data from another table as input to map/reduce process.

Possible Solutions:

a) Either we can have a join query for input to map process or

b) we can read needed data by making simple jdbc connection and requesting data again and again(although, i don't prefer it).

Questions:

What are the best practices in such scenario? We may move to mongoDB in future.What will be best practice in that scenario?

我认为目前不可能。

SQOOP and HIVE can be used.

You can use SQOOP for transfering data from mysql table to HDFS and then to HIVE . From HIVE (after operations) , you can export the tables back to Mysql.

Example :

  • First of all download mysql-connector-java-5.0.8 and put the jar to lib and bin folder of Sqoop
  • Create the table definition in Hive with exact field names and types as in mysql

sqoop import --verbose --fields-terminated-by ',' --connect jdbc:mysql://localhost/test --table employee --hive-import --warehouse-dir /user/hive/warehouse --fields-terminated-by ',' --split-by id --hive-table employee

Follow this Link for reference

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM