Where are HDFS directories created in Hadoop?

Question

I am running a simple, get-my-feet-wet, map reduce job, in pseudo-distributed mode as so:

bin/hadoop jar tm.jar TestMap input output

It ran fine the first time but on the second run, I am getting the following:

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/tom/output already exists

The initial commands that created the HDFS directories, use a hadoop command:

 $ bin/hdfs dfs -mkdir /user
 $ bin/hdfs dfs -mkdir /user/<username>

A few questions:

Where are these HDFS directories created and can they be deleted, if already exsist?
What's best practice for avoiding this?

Answer 1

When running a MR job, it will expect the ouput directory to be non-existent.

The first run of the job created it and the re-attempt of the job with same output path caused this exception.

And from your post, the output directory is provided as a relative path, in such case the directory will be created inside the user's HDFS directory ( /user/username/output ).

Yes, you can delete them if they already exist and you do not need them anymore.

hdfs dfs -rm -R output

To avoid this, You can either delete the directory and submit your job or provide a different non-existent path as output for the job.

Note: For example, if the provided output path is new/mapreduce/output , Hadoop expects the parent new/mapreduce/ to exist.

Where are HDFS directories created in Hadoop?

Question

1 answers

solution1
1 ACCPTED 2017-01-23 13:58:17

Where are HDFS directories created in Hadoop?

Question

1 answers

solution1 1 ACCPTED 2017-01-23 13:58:17

solution1
1 ACCPTED 2017-01-23 13:58:17