简体   繁体   中英

How to use hbase as a source for hadoop streaming jobs

Is there any way to use a Hbase table as a source for a Hadoop streaming job ? Specifically, I want to run a Hadoop streaming job written in Python. This works well when the input is specified as a folder on HDFS. But I've not been able to find any documentation about reading data from a Hbase table.

Is this supported ? Or I'll have to go through the ordeal of writing a java code for getting data from Hbase to HDFS first and then run streaming job ?

I'm using Hbase 0.94 from Cloudera.

(There is a similar question already present here . But it points to a third party solution, not actively contributed to. I was hoping that this will be supported in Hbase).

I would use Pig to load the data and then feed it into a streaming Python application.

See here: http://pig.apache.org/docs/r0.12.0/func.html#HBaseStorage http://pig.apache.org/docs/r0.12.0/basic.html#stream

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM