多个映射器输入和1个reducer输出的Hadoop jar命令错误（从2个文件中加入2个值）

Question

Here is my sample program joining 2 datasets. 这是我的示例程序，连接了2个数据集。 The program has 2 mappers and 1 reducer joining the values obtained from 2 different mappers having 2 different files as input. 该程序有2个映射器和1个reducer，它们结合了从2个不同映射器（具有2个不同文件作为输入）获得的值。

I am getting an error in the hadoop jar command. 我在hadoop jar命令中遇到错误。

command: 命令：

hadoop jar /home/rahul/Downloads/testjars/datajoin.jar DataJoin /user/rahul/cust.txt /user/rahul/delivery.txt /user/rahul/output hadoop jar /home/rahul/Downloads/testjars/datajoin.jar DataJoin /user/rahul/cust.txt /user/rahul/delivery.txt / user / rahul / output

Error: Invalid number of arguments Datajoin 错误：无效的参数数Datajoin

It is actually expecting only 1 input path and 1 output path whereas in my command I have 2 inputs for 2 different mappers and 1 output. 实际上，它只期望有1条输入路径和1条输出路径，而在我的命令中，我有2个输入用于2个不同的映射器和1个输出。

Can anyone help me out ? 谁能帮我吗？

Code: 码：

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class DataJoin {

    public static class TokenizerMapper1 extends Mapper {

        private Text word = new Text();

        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {

            String itr[] = value.toString().split("::");
            word.set(itr[0].trim());
            context.write(word, new Text("CD~" + itr[1]));
        }
    }

    public static class TokenizerMapper2 extends Mapper {

        private Text word = new Text();

        public void map(Object key, Text value, Context context)
                throws IOException, InterruptedException {

            String itr[] = value.toString().split("::");
            word.set(itr[0].trim());
            context.write(word, new Text("DD~" + itr[1]));
        }
    }

    public static class IntSumReducer extends Reducer {
        private Text result = new Text();

        public void reduce(Text key, Iterable values, Context context)
                throws IOException, InterruptedException {
            String sum = "";
            for (Text val : values) {
                sum += val.toString();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args)
                .getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Usage: DataJoin ");
            System.exit(2);
        }
        Job job = new Job(conf, "Data Join");
        job.setJarByClass(DataJoin.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        MultipleInputs.addInputPath(job, new Path(otherArgs[0]),
                TextInputFormat.class, TokenizerMapper1.class);
        MultipleInputs.addInputPath(job, new Path(otherArgs[1]),
                TextInputFormat.class, TokenizerMapper2.class);
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Answer 1

You have error in this portion 您在此部分中有错误

if (otherArgs.length != 2) {
   System.err.println("Usage: DataJoin ");
   System.exit(2);
}

Your argument is of length 3. 2 inputs and 1 output . 您的参数长度为3。2 个输入和1个输出 。

Argument count starts from 1,2... not from 0,1.... 参数计数从1,2 ...开始，而不是从0,1 ....

Change to 改成

if (otherArgs.length != 3) {
   System.err.println("Usage: DataJoin ");
   System.exit(0);
}

This solves your issue. 这样可以解决您的问题。

多个映射器输入和1个reducer输出的Hadoop jar命令错误（从2个文件中加入2个值）

问题描述

1 个解决方案

解决方案1
0 2014-10-13 04:53:47

多个映射器输入和1个reducer输出的Hadoop jar命令错误（从2个文件中加入2个值）

问题描述

1 个解决方案

解决方案1 0 2014-10-13 04:53:47

解决方案1
0 2014-10-13 04:53:47