简体   繁体   中英

Reading data from HBase in Reducer

Am new to Hadoop and HBase. Let me explain my question with an example. The data is made small for brevity.

Lets assume we have a file named item.log and it contains following information.

 ITEM-1,PRODUCT-1 ITEM-2,PRODUCT-1 ITEM-3,PRODUCT-2 ITEM-4,PRODUCT-2 ITEM-5,PRODUCT-3 ITEM-6,PRODUCT-1 ITEM-7,PRODUCT-1 ITEM-8,PRODUCT-2 ITEM-9,PRODUCT-1 

I have a map reduce code as below,

package org.sanjus.hadoop;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class ProductMapReduce {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable> {

        public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
            String[] columns = value.toString().split(",");

            if (columns.length != 2) {
                System.out.println("Bad line/value " + value);
                return;
            }

            Text word = new Text(columns[1]);
            LongWritable counter = new LongWritable(1L);

            output.collect(word, counter);
        }
    }


    public static class Reduce extends MapReduceBase implements Reducer<Text, LongWritable, Text, LongWritable> {

        public void reduce(Text key, Iterator<LongWritable> iterator, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {
            long sum = 0L;

            while (iterator.hasNext()) {
                sum += iterator.next().get();
            }
            output.collect(key, new LongWritable(sum));
        }

    }

    public static void main(String[] args) throws IOException {
        JobConf conf = new JobConf(ProductMapReduce.class);
        conf.setJobName("Product Analyzer");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(LongWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);
    }
}

LABEL 1: Output after map reduce is below:

 PRODUCT-1 5 PRODUCT-2 3 PRODUCT-3 1 

Here is a question:

I have a table in HBase, which has the following information stated below.

 PRODUCT-1 10$ PRODUCT-2 20$ PRODUCT-3 30$ 

Question/Requirement: I want the output of the reduce phase as a consolidation of the reduce output in the "LABEL 1: " and the HBase table stated above

 PRODUCT-1 10$ * 5 = 50$ PRODUCT-2 20$ * 3 = 60$ PRODUCT-3 30$ * 1 = 30$ 

Basically, Key is PRODUCT-1, Value in HBase Table for this key is 10$ and the value of the same key from reducer is 5 and both values are multiplied. ($ symbol is for understanding)

Note: Examples I found in are based on the input or output to HBase. My scenario is, input and output will be a file in HDFS, while I need to process the reducer outputs with information in HBase Table.

Since HBase supports high read throughput and you want to just read data in the reducer (a controlled number of them will be used): You can use HBase API to read the data from the table based on key of the reducer. Since reads in Hbase are fast (~10ms depending on size of data fetched) i do not think your performance will be impacted. Just make sure you initialize the Configuration & HTable in the configure() method of reducer.

This is what I did,

inside my reducer class, I added the overloaded method 'setup'

private HTable htable;

private Configuration config;

protected void setup(Context context) throws IOException, InterruptedException {
    Configuration config = HBaseConfiguration.create();
    config.addResource(new Path("/etc/hbase/conf.hbase1/hbase-site.xml"));
    try {
        htable = new HTable(config, "MY_TABLE");
    }
    catch (IOException e) {
        System.out.println("Error getting table from HBase", e);
    }

}

Using the HTable.get api, I got the Result object.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM