简体   繁体   English

Hadoop MapReduce-减速器未运行

[英]Hadoop mapreduce - reducer not running

I am trying to customize bulk load map-reduce into HBase, and I ran into issues with reducer. 我试图自定义将批量加载映射减少到HBase中,但遇到了reducer的问题。 First I thought I didn't write the reducer well, but upon throwing runtime exception in reducer and seeing the code working, I realized that the reducer is not running at all. 首先,我以为我没有很好地编写化简器,但是在化简器中抛出运行时异常并看到代码正常工作后,我意识到了化简器根本没有运行。 So far I don't see any thing wrong with some of common answers to this problem; 到目前为止,我对这个问题的一些常见答案没有发现任何问题。

  1. My configuration has mapoutput and output separate 我的配置将mapoutput和output分开
  2. My reducer and mapper has override 我的reducer和mapper已覆盖
  3. I have Iterable, and my reducer input is (writable, put), so... 我有Iterable,而我的reducer输入是(可写,放),所以...

Here's my code: 这是我的代码:

Driver 司机

public int run(String[] args) throws Exception {
    int result=0;
    String outputPath = args[1];
    Configuration configuration = getConf();
    configuration.set("data.seperator", DATA_SEPERATOR);
    configuration.set("hbase.table.name",TABLE_NAME);
    configuration.set("COLUMN_FAMILY_1",COLUMN_FAMILY_1);
    Job job = new Job(configuration);
    job.setJarByClass(HBaseBulkLoadDriver.class);
    job.setJobName("Bulk Loading HBase Table::"+TABLE_NAME);
    job.setInputFormatClass(TextInputFormat.class);
    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
    job.setMapperClass(HBaseBulkLoadMapper.class);
    job.setReducerClass(HBaseBulkLoadReducer.class);
    job.setOutputKeyClass(ImmutableBytesWritable.class);
    job.setOutputValueClass(Put.class);
    FileInputFormat.addInputPaths(job, args[0]);
    FileSystem.getLocal(getConf()).delete(new Path(outputPath), true);
    FileOutputFormat.setOutputPath(job, new Path(outputPath));
    job.setMapOutputValueClass(Put.class);
    job.setNumReduceTasks(1);
    HFileOutputFormat.configureIncrementalLoad(job, new HTable(configuration,TABLE_NAME));
    job.waitForCompletion(true);

Mapper 映射器

public class HBaseBulkLoadMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
    private String hbaseTable;
    private String dataSeperator;
    private String columnFamily1;
    private ImmutableBytesWritable hbaseTableName;

    public void setup(Context context) {
        Configuration configuration = context.getConfiguration();
        hbaseTable = configuration.get("hbase.table.name");
        dataSeperator = configuration.get("data.seperator");
        columnFamily1 = configuration.get("COLUMN_FAMILY_1");
        hbaseTableName = new ImmutableBytesWritable(Bytes.toBytes(hbaseTable));
    }
        @Override
    public void map(LongWritable key, Text value, Context context) {
        try {
            String[] values = value.toString().split(dataSeperator);
            String rowKey = values[0];
            Put put = new Put(Bytes.toBytes(rowKey));
            BUNCH OF ADDS;
            context.write(new ImmutableBytesWritable(Bytes.toBytes(rowKey)), put);
        } catch(Exception exception) {
            exception.printStackTrace();
        }
    }
}

Reducer 减速器

public class HBaseBulkLoadReducer extends Reducer<ImmutableBytesWritable, Put, ImmutableBytesWritable, Put> {
      @Override
      protected void reduce(
          ImmutableBytesWritable row,
          Iterable<Put> puts,
          Reducer<ImmutableBytesWritable, Put,
                  ImmutableBytesWritable, Put>.Context context)
          throws java.io.IOException, InterruptedException
      {
        TreeMap<String,KeyValue> map = new TreeMap<String,KeyValue>();
        int count =0;
        Append nkv;
        byte[] tmp= "".getBytes();
        Put pp = new Put(tmp);
    try{
        for (Put p : puts) {
              byte[] r =  "".getBytes();
              //KeyValue kv = new KeyValue(r);
              if (count!=0){
              r = p.getRow();
              pp.add(new KeyValue(r));
              //KeyValue k = map.get(row.toString());
              //nkv = new Append(k.getRowArray());
              //nkv=nkv.add(kv);
              //map.put(row.toString(), k.clone());
              //context.write(row,nkv);
              //tmp=ArrayUtils.addAll(tmp,kv.getValueArray());
              //map.put(row.toString(),new KeyValue(kv.getRowArray(),kv.getFamilyArray(),kv.getQualifierArray(),tmp));
              count++;
              throw new RuntimeException();
              }
              else{
              r = p.getRow();
              pp = new Put(row.toString().getBytes());
              pp.add(new KeyValue(r));
              //tmp=kv.clone().getValueArray();
              //nkv = new Append(kv.getRowArray());
              //map.put(row.toString(), kv.clone());
              count++;
              throw new RuntimeException();
          }
     }
      context.write(row,pp);
      }catch(Exception e) { e.printStackTrace();}
     }

}

Well I know reducer is kinda messy but the thing is, it has runtimeException on both if and else clauses as you can see and the bulk load succeeds, so I am quite sure that the reducer is not running - and I am not sure why. 我知道reducer有点混乱,但事实是,如您所见,它在if和else子句上都具有runtimeException并且大容量加载成功,所以我很确定reducer没有运行-我不确定为什么。 All three files are maven packaged in same directory, FYI. 这三个文件都打包在同一目录FYI中。

Figured out what was wrong. 找出问题所在。 configureincrementalload sets the reducer class to putsort or keyvaluesort according to output values, so if I want to use a custom reducer class I have to set it after configureincrementalload. configureincrementalload根据输出值将reducer类设置为putsort或keyvaluesort,因此,如果我想使用自定义的reducer类,则必须在configureincrementalload之后进行设置。 After that I could see reducer running. 之后,我可以看到减速器正在运行。 Just answering my own question so it may help people who run into same problem. 只需回答我自己的问题,这样就可以帮助遇到相同问题的人们。

HFileOutputFormat.configureIncrementalLoad(job, new HTable(configuration,TABLE_NAME));
job.setReducerClass(HBaseBulkLoadReducer.class);
job.waitForCompletion(true);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM