简体繁体中英

Scripted MapReduce with local directory input and HBase output

原文 2014-08-08 13:11:54 6 1 hadoop/ mapreduce/ hbase/ hadoop-streaming

Sometimes I'd like to perform some simple light-weight MapReduce. "Simple" means that it uses very simple algorithm, and "light-weight" means that I can implement it with few lines in some scripting language (or something like it).

My current task is to get data from files from directory on local filesystem, make minimal processing, and write it to HBase. Hadoop Streaming can read from local filesystem. However it cannot write to HBase. There is hadoop-hbase-streaming project declares such functionality. Unfortunately I couldn't get it to work. I guess, it's because the last commit to the library storage was in 2008 year. My task looks pretty common and I wonder why there isn't any update to hadoop-hbase-streaming library since 2008. I guess, there are some other ways to obtain my purposes nowadays. Could you tell me, what these ways are?

1 answers

I have been writing MR which loaded data from local filesystem into HBase in old version of Hadoop (Hadoop 1, I do not remember which version) and now I had to rewrite it, because the Hadoop libraries are completly different (currently using CDH5.0.1). So I do not wonder that hadoop hbase streaming is not working. But I found out that the simplest and easiest method (for me) how to upload data from local directory into HBase is using Pig. I tried this example and it worked perfect for me:

Using Pig to Bulk Load Data Into HBase

Unfortunatelly I do not know any other easier solution... So good luck and hope it helps a little

HBase table as MapReduce input?

MapReduce Input Output selectivity

HBase with MapReduce

Amazon Elastic MapReduce: Output directory

How to insert , MapReduce output into HBASE using same program

Mapreduce to hbase output stuck at map 100% reduce 100%

How to use a HBase secondary index table as and input in a MapReduce Job?

Synchronize data to HBase/HDFS and use it as input to MapReduce job

HBase mapreduce: write into HBase in Reducer

Do away with default output directory completely - MapReduce

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question HBase table as MapReduce input? MapReduce Input Output selectivity HBase with MapReduce Amazon Elastic MapReduce: Output directory How to insert , MapReduce output into HBASE using same program Mapreduce to hbase output stuck at map 100% reduce 100% How to use a HBase secondary index table as and input in a MapReduce Job? Synchronize data to HBase/HDFS and use it as input to MapReduce job HBase mapreduce: write into HBase in Reducer Do away with default output directory completely - MapReduce

Related Tags

Scripted MapReduce with local directory input and HBase output

Question

1 answers

solution1 1 2014-08-08 13:51:16

solution1
1 2014-08-08 13:51:16