Hadoop MapReduce-减速器未运行

Question

I am trying to customize bulk load map-reduce into HBase, and I ran into issues with reducer. 我试图自定义将批量加载映射减少到HBase中，但遇到了reducer的问题。 First I thought I didn't write the reducer well, but upon throwing runtime exception in reducer and seeing the code working, I realized that the reducer is not running at all. 首先，我以为我没有很好地编写化简器，但是在化简器中抛出运行时异常并看到代码正常工作后，我意识到了化简器根本没有运行。 So far I don't see any thing wrong with some of common answers to this problem; 到目前为止，我对这个问题的一些常见答案没有发现任何问题。

My configuration has mapoutput and output separate 我的配置将mapoutput和output分开
My reducer and mapper has override 我的reducer和mapper已覆盖
I have Iterable, and my reducer input is (writable, put), so... 我有Iterable，而我的reducer输入是（可写，放），所以...

Here's my code: 这是我的代码：

Driver 司机

public int run(String[] args) throws Exception {
    int result=0;
    String outputPath = args[1];
    Configuration configuration = getConf();
    configuration.set("data.seperator", DATA_SEPERATOR);
    configuration.set("hbase.table.name",TABLE_NAME);
    configuration.set("COLUMN_FAMILY_1",COLUMN_FAMILY_1);
    Job job = new Job(configuration);
    job.setJarByClass(HBaseBulkLoadDriver.class);
    job.setJobName("Bulk Loading HBase Table::"+TABLE_NAME);
    job.setInputFormatClass(TextInputFormat.class);
    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
    job.setMapperClass(HBaseBulkLoadMapper.class);
    job.setReducerClass(HBaseBulkLoadReducer.class);
    job.setOutputKeyClass(ImmutableBytesWritable.class);
    job.setOutputValueClass(Put.class);
    FileInputFormat.addInputPaths(job, args[0]);
    FileSystem.getLocal(getConf()).delete(new Path(outputPath), true);
    FileOutputFormat.setOutputPath(job, new Path(outputPath));
    job.setMapOutputValueClass(Put.class);
    job.setNumReduceTasks(1);
    HFileOutputFormat.configureIncrementalLoad(job, new HTable(configuration,TABLE_NAME));
    job.waitForCompletion(true);

Mapper 映射器

public class HBaseBulkLoadMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
    private String hbaseTable;
    private String dataSeperator;
    private String columnFamily1;
    private ImmutableBytesWritable hbaseTableName;

    public void setup(Context context) {
        Configuration configuration = context.getConfiguration();
        hbaseTable = configuration.get("hbase.table.name");
        dataSeperator = configuration.get("data.seperator");
        columnFamily1 = configuration.get("COLUMN_FAMILY_1");
        hbaseTableName = new ImmutableBytesWritable(Bytes.toBytes(hbaseTable));
    }
        @Override
    public void map(LongWritable key, Text value, Context context) {
        try {
            String[] values = value.toString().split(dataSeperator);
            String rowKey = values[0];
            Put put = new Put(Bytes.toBytes(rowKey));
            BUNCH OF ADDS;
            context.write(new ImmutableBytesWritable(Bytes.toBytes(rowKey)), put);
        } catch(Exception exception) {
            exception.printStackTrace();
        }
    }
}

Reducer 减速器

public class HBaseBulkLoadReducer extends Reducer<ImmutableBytesWritable, Put, ImmutableBytesWritable, Put> {
      @Override
      protected void reduce(
          ImmutableBytesWritable row,
          Iterable<Put> puts,
          Reducer<ImmutableBytesWritable, Put,
                  ImmutableBytesWritable, Put>.Context context)
          throws java.io.IOException, InterruptedException
      {
        TreeMap<String,KeyValue> map = new TreeMap<String,KeyValue>();
        int count =0;
        Append nkv;
        byte[] tmp= "".getBytes();
        Put pp = new Put(tmp);
    try{
        for (Put p : puts) {
              byte[] r =  "".getBytes();
              //KeyValue kv = new KeyValue(r);
              if (count!=0){
              r = p.getRow();
              pp.add(new KeyValue(r));
              //KeyValue k = map.get(row.toString());
              //nkv = new Append(k.getRowArray());
              //nkv=nkv.add(kv);
              //map.put(row.toString(), k.clone());
              //context.write(row,nkv);
              //tmp=ArrayUtils.addAll(tmp,kv.getValueArray());
              //map.put(row.toString(),new KeyValue(kv.getRowArray(),kv.getFamilyArray(),kv.getQualifierArray(),tmp));
              count++;
              throw new RuntimeException();
              }
              else{
              r = p.getRow();
              pp = new Put(row.toString().getBytes());
              pp.add(new KeyValue(r));
              //tmp=kv.clone().getValueArray();
              //nkv = new Append(kv.getRowArray());
              //map.put(row.toString(), kv.clone());
              count++;
              throw new RuntimeException();
          }
     }
      context.write(row,pp);
      }catch(Exception e) { e.printStackTrace();}
     }

}

Well I know reducer is kinda messy but the thing is, it has runtimeException on both if and else clauses as you can see and the bulk load succeeds, so I am quite sure that the reducer is not running - and I am not sure why. 我知道reducer有点混乱，但事实是，如您所见，它在if和else子句上都具有runtimeException并且大容量加载成功，所以我很确定reducer没有运行-我不确定为什么。 All three files are maven packaged in same directory, FYI. 这三个文件都打包在同一目录FYI中。

Answer 1

Figured out what was wrong. 找出问题所在。 configureincrementalload sets the reducer class to putsort or keyvaluesort according to output values, so if I want to use a custom reducer class I have to set it after configureincrementalload. configureincrementalload根据输出值将reducer类设置为putsort或keyvaluesort，因此，如果我想使用自定义的reducer类，则必须在configureincrementalload之后进行设置。 After that I could see reducer running. 之后，我可以看到减速器正在运行。 Just answering my own question so it may help people who run into same problem. 只需回答我自己的问题，这样就可以帮助遇到相同问题的人们。

HFileOutputFormat.configureIncrementalLoad(job, new HTable(configuration,TABLE_NAME));
job.setReducerClass(HBaseBulkLoadReducer.class);
job.waitForCompletion(true);

Hadoop MapReduce-减速器未运行

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-03-17 15:33:27

Hadoop MapReduce-减速器未运行

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-03-17 15:33:27

解决方案1
0 已采纳 2016-03-17 15:33:27