[英]Hadoop jar command error for multiple mapper inputs and 1 reducer output (Join 2 values from 2 files)
這是我的示例程序,連接了2個數據集。 該程序有2個映射器和1個reducer,它們結合了從2個不同映射器(具有2個不同文件作為輸入)獲得的值。
我在hadoop jar命令中遇到錯誤。
命令:
hadoop jar /home/rahul/Downloads/testjars/datajoin.jar DataJoin /user/rahul/cust.txt /user/rahul/delivery.txt / user / rahul / output
錯誤:無效的參數數Datajoin
實際上,它只期望有1條輸入路徑和1條輸出路徑,而在我的命令中,我有2個輸入用於2個不同的映射器和1個輸出。
誰能幫我嗎 ?
碼:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class DataJoin {
public static class TokenizerMapper1 extends Mapper {
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String itr[] = value.toString().split("::");
word.set(itr[0].trim());
context.write(word, new Text("CD~" + itr[1]));
}
}
public static class TokenizerMapper2 extends Mapper {
private Text word = new Text();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String itr[] = value.toString().split("::");
word.set(itr[0].trim());
context.write(word, new Text("DD~" + itr[1]));
}
}
public static class IntSumReducer extends Reducer {
private Text result = new Text();
public void reduce(Text key, Iterable values, Context context)
throws IOException, InterruptedException {
String sum = "";
for (Text val : values) {
sum += val.toString();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: DataJoin ");
System.exit(2);
}
Job job = new Job(conf, "Data Join");
job.setJarByClass(DataJoin.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
MultipleInputs.addInputPath(job, new Path(otherArgs[0]),
TextInputFormat.class, TokenizerMapper1.class);
MultipleInputs.addInputPath(job, new Path(otherArgs[1]),
TextInputFormat.class, TokenizerMapper2.class);
FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
您在此部分中有錯誤
if (otherArgs.length != 2) {
System.err.println("Usage: DataJoin ");
System.exit(2);
}
您的參數長度為3。2 個輸入和1個輸出 。
參數計數從1,2 ...開始,而不是從0,1 ....
改成
if (otherArgs.length != 3) {
System.err.println("Usage: DataJoin ");
System.exit(0);
}
這樣可以解決您的問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.