簡體   English   中英

Hadoop:Reduce不會產生所需的輸出,它與map輸出相同

[英]Hadoop: Reduce is not producing the desired output, it is same as map output

這是我的Map

 public static class MapClass extends Mapper<LongWritable, Text, Text, Text> {

        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException{
            String[] fields = value.toString().split(",", -20);
            String country = fields[4];
            String numClaims = fields[8];
            if (numClaims.length() > 0 && !numClaims.startsWith("\"")) {
                context.write(new Text(country), new Text(numClaims + ",1"));
            }
        }
    }

這是我的Reduce

public void reduce(Text key, Iterator<Text> values, Context context) throws IOException, InterruptedException {
            double sum = 0.0;
            int count = 0;

            while (values.hasNext()) {
                String[] fields = values.next().toString().split(",");
                sum += Double.parseDouble(fields[0]);
                count += Integer.parseInt(fields[1]);
            }

            context.write(new Text(key), new DoubleWritable(sum/count));
        }

以下是它的配置方式

Job job = new Job(getConf());

            job.setJarByClass(AverageByAttributeUsingCombiner.class);
            job.setJobName("AverageByAttributeUsingCombiner");

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(Text.class);

            job.setMapperClass(MapClass.class);
    //        job.setCombinerClass(Combinber.class);
            job.setReducerClass(Reduce.class);

            job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(TextOutputFormat.class);

            FileInputFormat.setInputPaths(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

    //        job.setNumReduceTasks(0); // to not run the reducer
            boolean success = job.waitForCompletion(true);
            return success ? 0 : 1;

輸入是形式的

   "PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD│                                                                                                                                                                                                                
    ","SECDLWBD"                                                                                                                                                                                                         │                                                                                                                                                                                                                
    3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,,                                                                                                                                                                 │                                                                                                                                                                                                                
    3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,,                                                                                                                                                                  │                                                                                                                                                                                                                
    3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,,                                                                                                                                                            │                                                                                                                                                                                                                
    3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,,        

整個map reduce的輸出看起來像

“AR”5,1│
“AR”9,1│
“AR”2,1│
“AR”15,1│
“AR”13,1│
“AR”1,1│
“AR”34,1│
“AR”12,1│
“AR”8,1│
“AR”7,1│
“AR”23,1│
“AR”3,1│
“AR”4,1│
“AR”4,1

如何調試和修復此問題? 我正在學習hadoop

如前所述,問題在於您沒有覆蓋默認抽象Reducer類的默認reduce方法。

更具體地說,到目前為止(one / the)問題是你的reduce方法簽名是:

 public void reduce(Text key, **Iterator**<Text> values, Context context) 
             throws IOException, InterruptedException

相反,它應該是:

 public void reduce(Text key, **Iterable**<Text> values, Context context) 
             throws IOException, InterruptedException

舊的API版本是正確的,您實現Reducer接口reduce()方法,它的工作原理。

對這種情況的一個很好的驗證是使用@Override因為它強制編譯簽名不匹配的時間檢查。

你的減速機沒有“捕捉”。 可能存在類型不匹配或類似的情況,因此您的reduce函數與它繼承的抽象接口不匹配......因此它不會覆蓋。 默認情況下, reduce將使用IdentityReducer ,它不執行任何操作(這正是您所遇到的)。

為了確保您實際覆蓋,請添加@override

@override
public void reduce(Text key, Iterator<Text> values, Context context)

這將引發錯誤,因為函數簽名不匹配。 這有望幫助您診斷問題。

  • 我目前正在使用hadoop-core-1.0.3.jar並嘗試使用新的API編寫Map Reduce ,不知道為什么它不起作用
  • 這個程序是Hadoop in Action代碼的一部分,我正在學習這本書的hadoop。
  • 當我使用old API syntax運行相同的map reduce程序時,它的工作原理非常好。
  • 代碼看起來像(包括Combiner,我在Combiner首先測試它)
 import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import java.io.IOException; import java.util.Iterator; public class AveragingWithCombiner extends Configured implements Tool { public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { static enum ClaimsCounters { MISSING, QUOTED }; public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { String fields[] = value.toString().split(",", -20); String country = fields[4]; String numClaims = fields[8]; if (numClaims.length() > 0 && !numClaims.startsWith("\\"")) { output.collect(new Text(country), new Text(numClaims + ",1")); } } } public static class Combine extends MapReduceBase implements Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { double sum = 0; int count = 0; while (values.hasNext()) { String fields[] = values.next().toString().split(","); sum += Double.parseDouble(fields[0]); count += Integer.parseInt(fields[1]); } output.collect(key, new Text(sum + "," + count)); } } public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, DoubleWritable> { public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, DoubleWritable> output, Reporter reporter) throws IOException { double sum = 0; int count = 0; while (values.hasNext()) { String fields[] = values.next().toString().split(","); sum += Double.parseDouble(fields[0]); count += Integer.parseInt(fields[1]); } output.collect(key, new DoubleWritable(sum/count)); } } public int run(String[] args) throws Exception { // Configuration processed by ToolRunner Configuration conf = getConf(); // Create a JobConf using the processed conf JobConf job = new JobConf(conf, AveragingWithCombiner.class); // Process custom command-line options Path in = new Path(args[0]); Path out = new Path(args[1]); FileInputFormat.setInputPaths(job, in); FileOutputFormat.setOutputPath(job, out); // Specify various job-specific parameters job.setJobName("AveragingWithCombiner"); job.setMapperClass(MapClass.class); job.setCombinerClass(Combine.class); job.setReducerClass(Reduce.class); job.setInputFormat(TextInputFormat.class); job.setOutputFormat(TextOutputFormat.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); // Submit the job, then poll for progress until the job is complete JobClient.runJob(job); return 0; } public static void main(String[] args) throws Exception { // Let ToolRunner handle generic command-line options int res = ToolRunner.run(new Configuration(), new AveragingWithCombiner(), args); System.exit(res); } } 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM