Is there any way to use a Hbase table as a source for a Hadoop streaming job ? Specifically, I want to run a Hadoop streaming job written in Python. This works well when the input is specified as a folder on HDFS. But I've not been able to find any documentation about reading data from a Hbase table.
Is this supported ? Or I'll have to go through the ordeal of writing a java code for getting data from Hbase to HDFS first and then run streaming job ?
I'm using Hbase 0.94 from Cloudera.
(There is a similar question already present here . But it points to a third party solution, not actively contributed to. I was hoping that this will be supported in Hbase).
I would use Pig to load the data and then feed it into a streaming Python application.
See here: http://pig.apache.org/docs/r0.12.0/func.html#HBaseStorage http://pig.apache.org/docs/r0.12.0/basic.html#stream
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.