简体   繁体   中英

How does HBase mapreduce job communicate with server? (newbie question)

I am new to Hadoop and HBase and even though I've read allot, I still don't understand the basic hierarchy and workflow of map reduce job API.

By what I understand, I will need to use the java API to implement certain classes and pass them to hbase which will coordinate the splitting and distribution process. Is that correct?

If so, how does the application communicate with the server to pass the relevant code for the map reduce job? I have a missing link here....

Thanks

When you run your HBase MapReduce job, your classpath has to contain both the HBase and MapReduce configuration files. The configuration files will contain settings such as the location of the JobTracker, the HDFS NameNode, and the HBase master node. The runtime will then automatically pick up all these settings from the configuration files so that your job knows which servers to contact.

I think you should just work through the basic tutorial , which should make things clear. I found the quickest way to get started was by playing with the Cloudera VM .

Also, I'm not sure about your reference to HBase; you should be passing Java classes to Hadoop, not HBase.

However, in an attempt to answer you question, Hadoop should be installed on all nodes in your cluster. The Hadoop framework will take care of farming the map and reduce tasks out to nodes.

The standard way to execute a M/R job using HBase is the same way you execute a non-HBase m/r job: ${HADOOP_HOME}/bin/hadoop jar .jar [args]

This copies your jar to all of the task trackers (via HDFS) so that they can execute your code.

With HBase you also will typically use the HBase utility: TableMapReduceUtil.initTableReducerJob

This uses built-in algorithms to split an HBase table (using the regions of the table) so that computation can be distributed over the m/r jobs. If you want a different split, you have to modify the way splits are calculated, which means that you cannot use the built-in utility.

The other thing you can specify is conditions on the rows that are returned. If you use a built-in scan condition, then you don't have to do anything special. However, if you want to create a custom comparator, you have to make sure that the region servers have this code in their classpath so that they can execute it. Before you go this route, examine the built-in comparators carefully, as they are quite powerful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM