Hadoop jar command error

Question

While executing the JAR file command on HDFS getting error as below

#hadoop jar WordCountNew.jar WordCountNew /MRInput57/Input-Big.txt /MROutput57
15/11/06 19:46:32 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/11/06 19:46:32 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:8020/var/lib/hadoop-0.20/cache/mapred/mapred/staging/root/.staging/job_201511061734_0003
15/11/06 19:46:32 ERROR security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /MRInput57/Input-Big.txt already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /MRInput57/Input-Big.txt already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:921)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:526)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:556)
    at MapReduce.WordCountNew.main(WordCountNew.java:114)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:197)


My Driver class Program is as below

    public static void main(String[] args) throws IOException, Exception {
        // Configutation details w. r. t. Job, Jar file
        Configuration conf = new Configuration();
        Job job = new Job(conf, "WORDCOUNTJOB");

        // Setting Driver class
        job.setJarByClass(MapReduceWordCount.class);
        // Setting the Mapper class
        job.setMapperClass(TokenizerMapper.class);
        // Setting the Combiner class
        job.setCombinerClass(IntSumReducer.class);
        // Setting the Reducer class
        job.setReducerClass(IntSumReducer.class);
        // Setting the Output Key class
        job.setOutputKeyClass(Text.class);
        // Setting the Output value class
        job.setOutputValueClass(IntWritable.class);
        // Adding the Input path
        FileInputFormat.addInputPath(job, new Path(args[0]));
        // Setting the output path
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        // System exit strategy
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

Can someone please rectify the issue in my code?

Regards Pranav

Answer 1

You need to check that the output directory doesn't already exist and delete it if it does. MapReduce can't (or won't) write files to a directory that exists. It needs to create the directory to be sure.

Add this:

Path outPath = new Path(args[1]);
FileSystem dfs = FileSystem.get(outPath.toUri(), conf);
if (dfs.exists(outPath)) {
    dfs.delete(outPath, true);
}

Answer 2

Output directory should not exist prior to execution of program. Either delete existing directory or provide new directory or remove output directory in your program.

I prefer deletion of output directory from command prompt before executing your program from command prompt.

From command prompt:

hdfs dfs -rm -r <your_output_directory_HDFS_URL>

From java:

Chris Gerken code is good enough.

Answer 3

您正在尝试创建用于存储输出的输出目录已存在。因此，请尝试删除以前相同名称的目录或更改输出目录的名称。

Answer 4

As others have noted, you are getting the error because the output directory already exists, most likely because you have tried executing this job before.

You can remove the existing output directory right before you run the job, ie:

#hadoop fs -rm -r /MROutput57 && \
hadoop jar WordCountNew.jar WordCountNew /MRInput57/Input-Big.txt /MROutput57

Hadoop jar command error

Question

4 answers

solution1
2 2015-11-07 05:52:21

solution2
0 2015-11-07 05:56:32

solution3
0 2015-11-07 06:13:47

solution4
0 2015-11-07 09:42:09

Hadoop jar command error

Question

4 answers

solution1 2 2015-11-07 05:52:21

solution2 0 2015-11-07 05:56:32

solution3 0 2015-11-07 06:13:47

solution4 0 2015-11-07 09:42:09

solution1
2 2015-11-07 05:52:21

solution2
0 2015-11-07 05:56:32

solution3
0 2015-11-07 06:13:47

solution4
0 2015-11-07 09:42:09