簡體   English   中英

Hadoop map reduce總是寫相同的值

[英]Hadoop map reduce always write the same values

我正在嘗試運行一個簡單的map reduce程序,其中mapper為同一個鍵寫入兩個不同的值,但是當我到達reducer時它們總是相同。

這是我的代碼:

public class kaka {

public static class Mapper4 extends Mapper<Text, Text, Text, Text>{
    public void map(Text key, Text value, Context context) throws IOException, InterruptedException {
        context.write(new Text("a"),new Text("b"));
        context.write(new Text("a"),new Text("c"));
    }
}

public static class Reducer4 extends Reducer<Text,Text,Text,Text> {
    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        Vector<Text> vals = new Vector<Text>(); 
        for (Text val : values){
            vals.add(val);
        }

        return;
    }
}
public static void main(String[] args) throws Exception {
    //deleteDir(new File("eran"));//todo
    Configuration conf = new Configuration();
    conf.set("mapred.map.tasks","10"); // asking for more mappers (it's a recommendation)
    conf.set("mapred.max.split.size","1000000"); // set default size of input split. 1000 means 1000 bytes. 

    Job job1 = new Job(conf, "find most similar words");
    job1.setJarByClass(kaka.class);
    job1.setInputFormatClass(SequenceFileInputFormat.class);
    job1.setMapperClass(Mapper4.class);
    job1.setReducerClass(Reducer4.class);
    job1.setOutputFormatClass(SequenceFileOutputFormat.class);
    job1.setOutputKeyClass(Text.class);
    job1.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job1, new Path("vectors/part-r-00000"));
    FileOutputFormat.setOutputPath(job1, new Path("result"));
    job1.waitForCompletion(true);
    System.exit(job1.waitForCompletion(true) ? 0 : 1);
}

}

在迭代reducer中的值時,你會被objext重用所困擾。 很久以前有一個JIRA補丁來提高效率,這意味着傳遞給映射器的Key / Value對象和傳遞給reducer的Key / Value對象總是相同的底層對象引用,只是那些對象的內容每次迭代都會改變。

修改代碼以在添加到向量之前復制值:

public static class Reducer4 extends Reducer<Text,Text,Text,Text> {
    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        Vector<Text> vals = new Vector<Text>(); 
        for (Text val : values){
            // make copy of val before adding to the Vector
            vals.add(new Text(val));
        }

        return;
    }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM