简体   繁体   中英

HBase Mapreduce output to hdfs & HBASe

I have a mapreduce program that first scans an HBase table.

I want some reducer output to go to hdfs and some reducer output to be written to an hbase table. Can a reducer be configured to output to two different locations/formats like this?

A reducer can be configured to use multiple files to output using the MulitpleOutputs class . The documentation at the top of that class provides a clear example for writing to multiple files. However, since there is no built in Outputformat for writing to HBase you might consider writing the 2nd stream to specific place on HDFS and then using another job to insert it into HBase.

If you don't want to write too much code, just open a Table in your mapper's or reducer's setup method and do a put statement into your hbase table. On the other hand, write your job such that the output file is an hdfs file. This way you get to both write to hbase and hdfs.

To be more elaborate, when you do a context.write(), you would write to the hdfs file, and on the other hand, the table.put can happen when you do a put.

Also, don't forget to close the table and anything else in your cleanup() method. The only backdrop is, if there are let's say 1000 mappers your table connection would be opened a 1000 times, but at any given point, only the max number of your mappers really run, so that would probably be 50, depending on your setup. Works for me at least!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM