[英]Why the job chaining not working in mapreduce?
我創建了兩個作業,並希望將它們鏈接起來,以便在上一個作業完成后才執行一個作業。 所以我寫了下面的代碼。 但是,正如我觀察到的那樣,job1正確完成了,而job2似乎從未執行過。
public class Simpletask extends Configured implements Tool {
public static enum FileCounters {
COUNT;
}
public static class TokenizerMapper extends Mapper<Object, Text, IntWritable, Text>{
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String line = itr.nextToken();
String part[] = line.split(",");
int id = Integer.valueOf(part[0]);
int x1 = Integer.valueOf(part[1]);
int y1 = Integer.valueOf(part[2]);
int z1 = Integer.valueOf(part[3]);
int x2 = Integer.valueOf(part[4]);
int y2 = Integer.valueOf(part[5]);
int z2 = Integer.valueOf(part[6]);
int h_v = Hilbert(x1,y1,z1);
int parti = h_v/10;
IntWritable partition = new IntWritable(parti);
Text neuron = new Text();
neuron.set(line);
context.write(partition,neuron);
}
}
public int Hilbert(int x,int y,int z){
return (int) (Math.random()*20);
}
}
public static class IntSumReducer extends Reducer<IntWritable,Text,IntWritable,Text> {
private Text result = new Text();
private MultipleOutputs<IntWritable, Text> mos;
public void setup(Context context) {
mos = new MultipleOutputs<IntWritable, Text>(context);
}
<K, V> String generateFileName(K k) {
return "p"+k.toString();
}
public void reduce(IntWritable key,Iterable<Text> values, Context context) throws IOException, InterruptedException {
String accu = "";
for (Text val : values) {
String[] entry=val.toString().split(",");
String MBR = entry[1];
accu+=entry[0]+",MBR"+MBR+" ";
}
result.set(accu);
context.getCounter(FileCounters.COUNT).increment(1);
mos.write(key, result, generateFileName(key));
}
}
public static class RTreeMapper extends Mapper<Object, Text, IntWritable, Text>{
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
System.out.println("WOWOWOWOW RUNNING");// NOTHING SHOWS UP!
}
}
public static class RTreeReducer extends Reducer<IntWritable,Text,IntWritable,Text> {
private MultipleOutputs<IntWritable, Text> mos;
Text t = new Text();
public void setup(Context context) {
mos = new MultipleOutputs<IntWritable, Text>(context);
}
public void reduce(IntWritable key,Iterable<Text> values, Context context) throws IOException, InterruptedException {
t.set("dsfs");
mos.write(key, t, "WOWOWOWOWOW"+key.get());
//ALSO, NOTHING IS WRITTEN TO THE FILE!!!!!
}
}
public static class RTreeInputFormat extends TextInputFormat{
protected boolean isSplitable(FileSystem fs, Path file) {
return false;
}
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Enter valid number of arguments <Inputdirectory> <Outputlocation>");
System.exit(0);
}
ToolRunner.run(new Configuration(), new Simpletask(), args);
}
@Override
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Job1");
job.setJarByClass(Simpletask.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);
boolean complete = job.waitForCompletion(true);
//================RTree Loop============
int capacity = 3;
Configuration rconf = new Configuration();
Job rtreejob = Job.getInstance(rconf, "rtree");
if(complete){
int count = (int) job.getCounters().findCounter(FileCounters.COUNT).getValue();
System.out.println("File count: "+count);
String path = null;
for(int i=0;i<count;i++){
path = "/Worker/p"+i+"-m-00000";
System.out.println("Add input path: "+path);
FileInputFormat.addInputPath(rtreejob, new Path(path));
}
System.out.println("Input path done.");
FileOutputFormat.setOutputPath(rtreejob, new Path("/RTree"));
rtreejob.setJarByClass(Simpletask.class);
rtreejob.setMapperClass(RTreeMapper.class);
rtreejob.setCombinerClass(RTreeReducer.class);
rtreejob.setReducerClass(RTreeReducer.class);
rtreejob.setOutputKeyClass(IntWritable.class);
rtreejob.setOutputValueClass(Text.class);
rtreejob.setInputFormatClass(RTreeInputFormat.class);
complete = rtreejob.waitForCompletion(true);
}
return 0;
}
}
對於mapreduce作業,輸出目錄不應存在。 它將首先檢查輸出目錄。 如果存在,則作業將失敗。 在您的情況下,您為兩個作業指定了相同的輸出目錄。 我修改了您的代碼。 我在job2中將args [1]更改為args [2]。 現在,第三個參數將是第二個作業的輸出目錄。 因此也要通過第三個論點。
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Job1");
job.setJarByClass(Simpletask.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//AND THEN I WAIT THIS JOB TO COMPLETE.
boolean complete = job.waitForCompletion(true);
//I START A NEW JOB, BUT WHY IS IT NOT RUNNING?
Configuration conf = new Configuration();
Job job2 = Job.getInstance(conf, "Job2");
job2.setJarByClass(Simpletask.class);
job2.setMapperClass(TokenizerMapper.class);
job2.setCombinerClass(IntSumReducer.class);
job2.setReducerClass(IntSumReducer.class);
job2.setOutputKeyClass(IntWritable.class);
job2.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job2, new Path(args[0]));
FileOutputFormat.setOutputPath(job2, new Path(args[2]));
錯誤的幾種可能原因:
conf
聲明兩次(那里沒有編譯錯誤?) job.setMapOutputKeyClass(Text.class);
和job.setMapOutputValueClass(IntWritable.class);
兩個工作。 job2.waitForCompletion(true);
,或類似的東西? 總體:檢查日志中是否有錯誤消息,該消息應清楚地解釋出了什么問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.