簡體   English   中英

使用MapReduce MultipleOutputs清空輸出文件

[英]Empty output files using MapReduce MultipleOutputs

我在我的Reducer中使用MultipleOutputs,因為我希望每個鍵都有單獨的結果文件,但是,盡管創建了默認結果文件part-r-xxxx並包含正確的值,但每個結果文件都是空的。

這是我的JobDriver和Reducer代碼

主班

public static void main(String[] args) throws Exception {
    int currentIteration = 0;
    int reducerCount, roundCount;

    Configuration conf = createConfiguration(currentIteration);
    cleanEnvironment(conf);
    Job job = new Job(conf, "cfim");

    //Input and output format configuration
    job.setMapperClass(TransactionsMapper.class);
    job.setReducerClass(PatriciaReducer.class);

    job.setInputFormatClass(TransactionInputFormat.class);
    job.setMapOutputKeyClass(LongWritable.class);
    job.setMapOutputValueClass(Text.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    reducerCount = roundCount = Math.floorDiv(getRoundCount(conf), Integer.parseInt(conf.get(MRConstants.mergeFactorSpecifier)));

    FileInputFormat.addInputPath(job, new Path("/home/cloudera/datasets/input"));
    Path outputPath = new Path(String.format(MRConstants.outputPathFormat, outputDir, currentIteration));
    FileOutputFormat.setOutputPath(job, outputPath);
    MultipleOutputs.addNamedOutput(job, "key", TextOutputFormat.class, LongWritable.class, Text.class);

    job.waitForCompletion(true);

減速機類

public class PatriciaReducer extends Reducer<LongWritable, Text, LongWritable, Text> {

private ITreeManager treeManager;
private SerializationManager serializationManager;
private MultipleOutputs<LongWritable, Text> mos;

@Override 
protected void setup(Context context) throws IOException ,InterruptedException {
    treeManager = new PatriciaTreeManager();
    serializationManager = new SerializationManager();
    mos = new MultipleOutputs<LongWritable, Text>(context);
}

@Override
protected void reduce(LongWritable key, Iterable<Text> items, Context context)
        throws IOException, InterruptedException {

    Iterator<Text> patriciaIterator = items.iterator();
    PatriciaTree tree = new PatriciaTree();

    if (patriciaIterator.hasNext()){
        Text input = patriciaIterator.next();
        tree = serializationManager.deserializePatriciaTree(input.toString());
    }

    while(patriciaIterator.hasNext()){
        Text input = patriciaIterator.next();
        PatriciaTree mergeableTree = serializationManager.deserializePatriciaTree(input.toString());
        tree = treeManager.mergeTree(tree, mergeableTree, false);
    }

    Text outputValue = new Text(serializationManager.serializeAsJson(tree));
    mos.write("key", key, outputValue, generateOutputPath(key));
    context.write(key, outputValue);
}

@Override
protected void finalize() throws Throwable {
    // TODO Auto-generated method stub
    super.finalize();
    mos.close();
}

private String generateOutputPath(LongWritable key) throws IOException {
    String outputPath = String.format("%s-%s", MRConstants.reduceResultValue, key.toString());
    return outputPath;
}   

}

難道我做錯了什么?

我發現我使用錯誤的方法關閉了多個輸出對象。 在用cleanup方法而不是finalize方法關閉MultipleOutputs之后,一切工作正常

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM