简体繁体中英

How to use hbase as a source for hadoop streaming jobs

原文 2014-03-28 05:23:49 9 1 python/ hadoop/ hbase/ hadoop-streaming

Is there any way to use a Hbase table as a source for a Hadoop streaming job ? Specifically, I want to run a Hadoop streaming job written in Python. This works well when the input is specified as a folder on HDFS. But I've not been able to find any documentation about reading data from a Hbase table.

Is this supported ? Or I'll have to go through the ordeal of writing a java code for getting data from Hbase to HDFS first and then run streaming job ?

I'm using Hbase 0.94 from Cloudera.

(There is a similar question already present here . But it points to a third party solution, not actively contributed to. I was hoping that this will be supported in Hbase).

1 answers

I would use Pig to load the data and then feed it into a streaming Python application.

See here: http://pig.apache.org/docs/r0.12.0/func.html#HBaseStorage http://pig.apache.org/docs/r0.12.0/basic.html#stream

Map only jobs in spark (vs hadoop streaming)

Not able to execute Python based Hadoop Streaming jobs

Hadoop streaming jobs SUCCEEDED but killed by ApplicationMaster

How to use a file in a hadoop streaming job using python?

Can we cascade multiple MapReduce jobs in Hadoop Streaming (lang: Python)

How to deactivate output in Hadoop streaming?

Hadoop streaming with Bash - how slow?

How to read other files in hadoop jobs?

How to find the size of the dataframe in spark streaming jobs

How to run a MRJob in a local Hadoop Cluster with Hadoop Streaming?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Map only jobs in spark (vs hadoop streaming) Not able to execute Python based Hadoop Streaming jobs Hadoop streaming jobs SUCCEEDED but killed by ApplicationMaster How to use a file in a hadoop streaming job using python? Can we cascade multiple MapReduce jobs in Hadoop Streaming (lang: Python) How to deactivate output in Hadoop streaming? Hadoop streaming with Bash - how slow? How to read other files in hadoop jobs? How to find the size of the dataframe in spark streaming jobs How to run a MRJob in a local Hadoop Cluster with Hadoop Streaming?

Related Tags

How to use hbase as a source for hadoop streaming jobs

Question

1 answers

solution1 0 2014-03-30 07:45:18

solution1
0 2014-03-30 07:45:18