I'm running a mapreduce job with the following run code and it keeps giving me the following exception. I made sure that I remove the folder before starting the job but it doesn't work.
The code:
JobConf jobConf = new JobConf( getConf(), MPTU.class );
jobConf.setJobName( "MPTU" );
AvroJob.setMapperClass( jobConf, MPTUMapper.class );
AvroJob.setReducerClass( jobConf, MPTUReducer.class );
long milliSeconds = 1000 * 60 * 60;
jobConf.setLong( "mapred.task.timeout", milliSeconds );
Job job = new Job( jobConf );
job.setJarByClass( MPTU.class );
String paths = args[0] + "," + args[1];
FileInputFormat.setInputPaths( job, paths );
Path outputDir = new Path( args[2] );
outputDir.getFileSystem( jobConf ).delete( outputDir, true );
FileOutputFormat.setOutputPath( job, outputDir );
AvroJob.setInputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.LONG ), Schema.create( Type.STRING ) ) );
AvroJob.setMapOutputSchema( jobConf, Pair.getPairSchema( Schema.create( Type.STRING ),
Schema.create( Type.STRING ) ) );
AvroJob.setOutputSchema( jobConf,
Pair.getPairSchema( Schema.create( Type.STRING ), Schema.create( Type.STRING ) ) );
job.setNumReduceTasks( 400 );
job.submit();
JobClient.runJob( jobConf );
The Exception:
13:31:39,268 ERROR UserGroupInformation:1335 - PriviledgedActionException as:msadri (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/Users/msadri/Documents/files/linkage_output already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:937)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1319)
at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.run(MatchProfileTwitterUserHandler.java:58)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.reunify.socialmedia.RecordLinkage.MatchProfileTwitterUserHandler.main(MatchProfileTwitterUserHandler.java:81)
Correct me if my understanding is wrong.. In the above code, you are referring to "/Users/msadri/Documents/.....", in local file system isn't it.? it seems like fs.defaultFS in core-site.xml is pointing to file:/// instead of hdfs address for your cluster.
1) If you needed to point to Local file system as per your requirement, then try this.
FileSystem.getLocal(conf).delete(outputDir, true);
2) If it is expected to point hdfs then Please check core-site.xml and in that, fs.defaultFS has to point to hdfs://<nameNode>:<port>/
then try it once.. (Error message saying that you are pointing to local file system. if it is pointing to hdfs, it would say "Output directory hdfs://<nameNode>:<port>/Users/msadri/...
already exists"
Rule this out if its not necessary. Please let me know your response..
Can you try as
outputDir.getFileSystem( jobConf ).delete( outputDir, true );
//to
FileSystem fs = FileSystem.get(jobConf);
fs.delete(outputDir, true);
You can try this too
Deletes output folder if already exist.
You are getting above exception because your output directory (/Users/msadri/Documents/files/linkage_output) is already created/existing in the HDFS file system
Just remember while running map reduce job do mention the output directory which is already their in HDFS. Please refer to the following instruction which would help you to resolve this exception
To run a map reduce job you have to write a command similar to below command
$hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}
Example:- hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output
Just pay attention on the {output_directory_path} ie /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".
Solution:- Always specify the output directory name at run time(ie Hadoop will create the directory automatically for you. You need not to worry about the output directory creation). As mentioned in the above example the same command can be run in following manner -
"hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1"
So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.
For more details you can refer to : - https://jhooq.com/hadoop-file-already-exists-exception/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.