組合器在HBase掃描mapreduce中為每個區域創建mapoutput文件

Question

嗨，我正在運行一個從HBase讀取記錄並寫入文本文件的應用程序。

我在應用程序和自定義分區中都使用了合並器。 我在應用程序中使用了41減速器，因為我需要在自定義分區程序中創建滿足我的條件的40減速器輸出文件。

一切正常，但是當我在應用程序中使用合並器時，它會按區域或每個映射器創建映射輸出文件。

敵人的例子中，我的應用程序中有40個區域，所以啟動了40個映射器，然后創建了40個映射輸出文件。 但是reducer無法合並所有map-output並生成最終的reducer輸出文件，該文件將是40個reducer輸出文件。

文件中的數據是正確的，但沒有文件增加。

任何想法我怎么能只獲得減速器輸出文件。

// Reducer Class
    job.setCombinerClass(CommonReducer.class);
    job.setReducerClass(CommonReducer.class); // reducer class

以下是我的工作詳細信息

Submitted:  Mon Apr 10 09:42:55 CDT 2017
Started:    Mon Apr 10 09:43:03 CDT 2017
Finished:   Mon Apr 10 10:11:20 CDT 2017
Elapsed:    28mins, 17sec
Diagnostics:    
Average Map Time    6mins, 13sec
Average Shuffle Time    17mins, 56sec
Average Merge Time  0sec
Average Reduce Time     0sec

這是我的減速器邏輯

import java.io.IOException;
import org.apache.log4j.Logger;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;

public class CommonCombiner extends Reducer<NullWritable, Text, NullWritable, Text> {

    private Logger logger = Logger.getLogger(CommonCombiner.class);
    private MultipleOutputs<NullWritable, Text> multipleOutputs;
    String strName = "";
    private static final String DATA_SEPERATOR = "\\|\\!\\|";

    public void setup(Context context) {
        logger.info("Inside Combiner.");
        multipleOutputs = new MultipleOutputs<NullWritable, Text>(context);
    }

    @Override
    public void reduce(NullWritable Key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {

        for (Text value : values) {
            final String valueStr = value.toString();
            StringBuilder sb = new StringBuilder();
            if ("".equals(strName) && strName.length() == 0) {
                String[] strArrFileName = valueStr.split(DATA_SEPERATOR);
                String strFullFileName[] = strArrFileName[1].split("\\|\\^\\|");

                strName = strFullFileName[strFullFileName.length - 1];


                String strArrvalueStr[] = valueStr.split(DATA_SEPERATOR);
                if (!strArrvalueStr[0].contains(HbaseBulkLoadMapperConstants.FF_ACTION)) {
                    sb.append(strArrvalueStr[0] + "|!|");
                }
                multipleOutputs.write(NullWritable.get(), new Text(sb.toString()), strName);
                context.getCounter(Counters.FILE_DATA_COUNTER).increment(1);


            }

        }
    }


    public void cleanup(Context context) throws IOException, InterruptedException {
        multipleOutputs.close();
    }
}

Answer 1

我已經替換了multipleOutputs.write(NullWritable.get(), new Text(sb.toString()), strName); 同

context.write()

而且我得到正確的輸出。

組合器在HBase掃描mapreduce中為每個區域創建mapoutput文件

問題描述

1 個解決方案

解決方案1
0 已采納 2017-04-13 10:35:12

組合器在HBase掃描mapreduce中為每個區域創建mapoutput文件

問題描述

1 個解決方案

解決方案1 0 已采納 2017-04-13 10:35:12

解決方案1
0 已采納 2017-04-13 10:35:12