简体   繁体   中英

Mapreduce Hadoop job exception Output directory already exists

I'm running a mapreduce job with the following run code and it keeps giving me the following exception. I made sure that I remove the folder before starting the job but it doesn't work.

The code:

    JobConf jobConf = new JobConf( getConf(), MPTU.class );
    jobConf.setJobName( "MPTU" );

    AvroJob.setMapperClass( jobConf, MPTUMapper.class );
    AvroJob.setReducerClass( jobConf, MPTUReducer.class );

    long milliSeconds = 1000 * 60 * 60;
    jobConf.setLong( "mapred.task.timeout", milliSeconds );

    Job job = new Job( jobConf );
    job.setJarByClass( MPTU.class );

    String paths = args[0] + "," + args[1];
    FileInputFormat.setInputPaths( job, paths );
    Path outputDir = new Path( args[2] );
    outputDir.getFileSystem( jobConf ).delete( outputDir, true );
    FileOutputFormat.setOutputPath( job, outputDir );

    AvroJob.setInputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.LONG ), Schema.create( Type.STRING ) ) );
    AvroJob.setMapOutputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.STRING ),
                                                             Schema.create( Type.STRING ) ) );
    AvroJob.setOutputSchema( jobConf,
                             Pair.getPairSchema( Schema.create( Type.STRING ), Schema.create( Type.STRING ) ) );

    job.setNumReduceTasks( 400 );
    job.submit();
    JobClient.runJob( jobConf );

The Exception:

13:31:39,268 ERROR UserGroupInformation:1335 - PriviledgedActionException as:msadri (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
    at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:937)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1319)
    at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.run(MatchProfileTwitterUserHandler.java:58)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.main(MatchProfileTwitterUserHandler.java:81)

Correct me if my understanding is wrong.. In the above code, you are referring to "/Users/msadri/Documents/.....", in local file system isn't it.? it seems like fs.defaultFS in core-site.xml is pointing to file:/// instead of hdfs address for your cluster.

1) If you needed to point to Local file system as per your requirement, then try this.

FileSystem.getLocal(conf).delete(outputDir, true);

2) If it is expected to point hdfs then Please check core-site.xml and in that, fs.defaultFS has to point to hdfs://<nameNode>:<port>/ then try it once.. (Error message saying that you are pointing to local file system. if it is pointing to hdfs, it would say "Output directory hdfs://<nameNode>:<port>/Users/msadri/... already exists"

Rule this out if its not necessary. Please let me know your response..

Can you try as

 outputDir.getFileSystem( jobConf ).delete( outputDir, true );

//to

FileSystem fs = FileSystem.get(jobConf);
fs.delete(outputDir, true);

You can try this too

Deletes output folder if already exist.

You are getting above exception because your output directory (/Users/msadri/Documents/files/linkage_output) is already created/existing in the HDFS file system

Just remember while running map reduce job do mention the output directory which is already their in HDFS. Please refer to the following instruction which would help you to resolve this exception

To run a map reduce job you have to write a command similar to below command

$hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}

Example:- hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output

Just pay attention on the {output_directory_path} ie /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".

Solution:- Always specify the output directory name at run time(ie Hadoop will create the directory automatically for you. You need not to worry about the output directory creation). As mentioned in the above example the same command can be run in following manner -

"hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1"

So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.

For more details you can refer to : - https://jhooq.com/hadoop-file-already-exists-exception/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM