简体繁体中英

HBase Mapreduce output to hdfs & HBASe

原文 2012-03-14 16:30:30 5 3 mapreduce/ hbase/ hdfs

I have a mapreduce program that first scans an HBase table.

I want some reducer output to go to hdfs and some reducer output to be written to an hbase table. Can a reducer be configured to output to two different locations/formats like this?

3 answers

A reducer can be configured to use multiple files to output using the MulitpleOutputs class . The documentation at the top of that class provides a clear example for writing to multiple files. However, since there is no built in Outputformat for writing to HBase you might consider writing the 2nd stream to specific place on HDFS and then using another job to insert it into HBase.

我认为多个输出可以完成这项工作。.chk tis出http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

If you don't want to write too much code, just open a Table in your mapper's or reducer's setup method and do a put statement into your hbase table. On the other hand, write your job such that the output file is an hdfs file. This way you get to both write to hbase and hdfs.

To be more elaborate, when you do a context.write(), you would write to the hdfs file, and on the other hand, the table.put can happen when you do a put.

Also, don't forget to close the table and anything else in your cleanup() method. The only backdrop is, if there are let's say 1000 mappers your table connection would be opened a 1000 times, but at any given point, only the max number of your mappers really run, so that would probably be 50, depending on your setup. Works for me at least!

using HBase instead of HDFS in MapReduce

Load MapReduce output data into HBase

Mapreduce on hbase

HBase MapReduce

Scripted MapReduce with local directory input and HBase output

HBase mapreduce: write into HBase in Reducer

Synchronize data to HBase/HDFS and use it as input to MapReduce job

Example for running mapreduce on hdfs files and storing reducer results in hbase table

MapReduce job with mixed data sources: HBase table and HDFS files

MapReduce with put in HBase

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question using HBase instead of HDFS in MapReduce Load MapReduce output data into HBase Mapreduce on hbase HBase MapReduce Scripted MapReduce with local directory input and HBase output HBase mapreduce: write into HBase in Reducer Synchronize data to HBase/HDFS and use it as input to MapReduce job Example for running mapreduce on hdfs files and storing reducer results in hbase table MapReduce job with mixed data sources: HBase table and HDFS files MapReduce with put in HBase

Related Tags

HBase Mapreduce output to hdfs & HBASe

Question

3 answers

solution1
3 ACCPTED 2012-03-16 19:24:22

solution2
1 2012-06-07 12:00:46

solution3
1 2012-03-26 18:23:16

HBase Mapreduce output to hdfs & HBASe

Question

3 answers

solution1 3 ACCPTED 2012-03-16 19:24:22

solution2 1 2012-06-07 12:00:46

solution3 1 2012-03-26 18:23:16

solution1
3 ACCPTED 2012-03-16 19:24:22

solution2
1 2012-06-07 12:00:46

solution3
1 2012-03-26 18:23:16