I am running a simple, get-my-feet-wet, map reduce job, in pseudo-distributed mode as so:
bin/hadoop jar tm.jar TestMap input output
It ran fine the first time but on the second run, I am getting the following:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/tom/output already exists
The initial commands that created the HDFS directories, use a hadoop command:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>
A few questions:
When running a MR job, it will expect the ouput
directory to be non-existent.
The first run of the job created it and the re-attempt of the job with same output path caused this exception.
And from your post, the output
directory is provided as a relative path, in such case the directory will be created inside the user's HDFS directory ( /user/username/output
).
Yes, you can delete them if they already exist and you do not need them anymore.
hdfs dfs -rm -R output
To avoid this, You can either delete the directory and submit your job or provide a different non-existent path as output for the job.
Note: For example, if the provided output path is new/mapreduce/output
, Hadoop expects the parent new/mapreduce/
to exist.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.