简体   繁体   中英

Scripted MapReduce with local directory input and HBase output

Sometimes I'd like to perform some simple light-weight MapReduce. "Simple" means that it uses very simple algorithm, and "light-weight" means that I can implement it with few lines in some scripting language (or something like it).

My current task is to get data from files from directory on local filesystem, make minimal processing, and write it to HBase. Hadoop Streaming can read from local filesystem. However it cannot write to HBase. There is hadoop-hbase-streaming project declares such functionality. Unfortunately I couldn't get it to work. I guess, it's because the last commit to the library storage was in 2008 year. My task looks pretty common and I wonder why there isn't any update to hadoop-hbase-streaming library since 2008. I guess, there are some other ways to obtain my purposes nowadays. Could you tell me, what these ways are?

I have been writing MR which loaded data from local filesystem into HBase in old version of Hadoop (Hadoop 1, I do not remember which version) and now I had to rewrite it, because the Hadoop libraries are completly different (currently using CDH5.0.1). So I do not wonder that hadoop hbase streaming is not working. But I found out that the simplest and easiest method (for me) how to upload data from local directory into HBase is using Pig. I tried this example and it worked perfect for me:

Using Pig to Bulk Load Data Into HBase

Unfortunatelly I do not know any other easier solution... So good luck and hope it helps a little

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM