為什么作業鏈無法在mapreduce中工作？

Question

我創建了兩個作業，並希望將它們鏈接起來，以便在上一個作業完成后才執行一個作業。 所以我寫了下面的代碼。 但是，正如我觀察到的那樣，job1正確完成了，而job2似乎從未執行過。

public class Simpletask extends Configured implements Tool {
public static enum FileCounters {
    COUNT;
}
public static class TokenizerMapper extends Mapper<Object, Text, IntWritable, Text>{

      public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
          StringTokenizer itr = new StringTokenizer(value.toString());
          while (itr.hasMoreTokens()) {
              String line = itr.nextToken();
              String part[] = line.split(",");
              int id = Integer.valueOf(part[0]);
              int x1 = Integer.valueOf(part[1]);
              int y1 = Integer.valueOf(part[2]);
              int z1 = Integer.valueOf(part[3]);
              int x2 = Integer.valueOf(part[4]);
              int y2 = Integer.valueOf(part[5]);
              int z2 = Integer.valueOf(part[6]);
              int h_v = Hilbert(x1,y1,z1);
              int parti = h_v/10;
             IntWritable partition = new IntWritable(parti);
             Text neuron = new Text();
             neuron.set(line);
             context.write(partition,neuron);
          }
}
public int Hilbert(int x,int y,int z){
          return (int) (Math.random()*20);
      }
  }

public static class IntSumReducer extends Reducer<IntWritable,Text,IntWritable,Text> {

private Text result = new Text();
private MultipleOutputs<IntWritable, Text> mos;
public void setup(Context context) {
    mos = new MultipleOutputs<IntWritable, Text>(context);
}
<K, V> String generateFileName(K k) {
       return "p"+k.toString();
}
public void reduce(IntWritable key,Iterable<Text> values, Context context) throws IOException, InterruptedException {
    String accu = "";
    for (Text val : values) {
        String[] entry=val.toString().split(",");
        String MBR = entry[1];
        accu+=entry[0]+",MBR"+MBR+" ";
    }
    result.set(accu);
    context.getCounter(FileCounters.COUNT).increment(1);
    mos.write(key, result, generateFileName(key));
}
}

public static class RTreeMapper extends Mapper<Object, Text, IntWritable, Text>{
  public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
   System.out.println("WOWOWOWOW RUNNING");// NOTHING SHOWS UP!
  }
  }

public static class RTreeReducer extends Reducer<IntWritable,Text,IntWritable,Text> {
private MultipleOutputs<IntWritable, Text> mos;
Text t = new Text();

public void setup(Context context) {
    mos = new MultipleOutputs<IntWritable, Text>(context);
}
public void reduce(IntWritable key,Iterable<Text> values, Context context) throws IOException, InterruptedException {
    t.set("dsfs");
    mos.write(key, t, "WOWOWOWOWOW"+key.get());
//ALSO, NOTHING IS WRITTEN TO THE FILE!!!!!
}
}
public static class RTreeInputFormat extends TextInputFormat{
 protected boolean isSplitable(FileSystem fs, Path file) {
        return false;
    }
}

public static void main(String[] args) throws Exception {
    if (args.length != 2) {
           System.err.println("Enter valid number of arguments <Inputdirectory>  <Outputlocation>");
           System.exit(0);
          }
          ToolRunner.run(new Configuration(), new Simpletask(), args);
}

@Override
public int run(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Job1");
    job.setJarByClass(Simpletask.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(IntWritable.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
    boolean complete = job.waitForCompletion(true);

    //================RTree Loop============
    int capacity = 3;
    Configuration rconf = new Configuration();
    Job rtreejob = Job.getInstance(rconf, "rtree");
    if(complete){
        int count =  (int) job.getCounters().findCounter(FileCounters.COUNT).getValue();
        System.out.println("File count: "+count);
        String path = null;
        for(int i=0;i<count;i++){
            path = "/Worker/p"+i+"-m-00000";
            System.out.println("Add input path: "+path);
            FileInputFormat.addInputPath(rtreejob, new Path(path));
        }
        System.out.println("Input path done.");
        FileOutputFormat.setOutputPath(rtreejob, new Path("/RTree"));
        rtreejob.setJarByClass(Simpletask.class);
        rtreejob.setMapperClass(RTreeMapper.class);
        rtreejob.setCombinerClass(RTreeReducer.class);
        rtreejob.setReducerClass(RTreeReducer.class);
        rtreejob.setOutputKeyClass(IntWritable.class);
        rtreejob.setOutputValueClass(Text.class);
        rtreejob.setInputFormatClass(RTreeInputFormat.class);
        complete = rtreejob.waitForCompletion(true);
}
    return 0;
}
}

Answer 1

對於mapreduce作業，輸出目錄不應存在。 它將首先檢查輸出目錄。 如果存在，則作業將失敗。 在您的情況下，您為兩個作業指定了相同的輸出目錄。 我修改了您的代碼。 我在job2中將args [1]更改為args [2]。 現在，第三個參數將是第二個作業的輸出目錄。 因此也要通過第三個論點。

    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "Job1");
    job.setJarByClass(Simpletask.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(IntWritable.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    //AND THEN I WAIT THIS JOB TO COMPLETE.
    boolean complete = job.waitForCompletion(true);

    //I START A NEW JOB, BUT WHY IS IT NOT RUNNING?
    Configuration conf = new Configuration();
    Job job2 = Job.getInstance(conf, "Job2");
    job2.setJarByClass(Simpletask.class);
    job2.setMapperClass(TokenizerMapper.class);
    job2.setCombinerClass(IntSumReducer.class);
    job2.setReducerClass(IntSumReducer.class);
    job2.setOutputKeyClass(IntWritable.class);
    job2.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job2, new Path(args[0]));
    FileOutputFormat.setOutputPath(job2, new Path(args[2]));

Answer 2

錯誤的幾種可能原因：

conf聲明兩次（那里沒有編譯錯誤？）
job2的輸出路徑已經存在，因為它是從job1創建的（+1到Amal G Jose的答案）
我認為您也應該使用job.setMapOutputKeyClass(Text.class); 和job.setMapOutputValueClass(IntWritable.class); 兩個工作。
在發布的代碼段之后，您是否還具有執行job2的命令？ 我的意思是，您實際上是否運行job2.waitForCompletion(true); ，或類似的東西？

總體：檢查日志中是否有錯誤消息，該消息應清楚地解釋出了什么問題。

為什么作業鏈無法在mapreduce中工作？

問題描述

2 個解決方案

解決方案1
1 2015-07-31 06:01:45

解決方案2
1 2015-07-31 07:34:13

為什么作業鏈無法在mapreduce中工作？

問題描述

2 個解決方案

解決方案1 1 2015-07-31 06:01:45

解決方案2 1 2015-07-31 07:34:13

解決方案1
1 2015-07-31 06:01:45

解決方案2
1 2015-07-31 07:34:13