简体   繁体   中英

Where are HDFS directories created in Hadoop?

I am running a simple, get-my-feet-wet, map reduce job, in pseudo-distributed mode as so:

bin/hadoop jar tm.jar TestMap input output

It ran fine the first time but on the second run, I am getting the following:

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/tom/output already exists

The initial commands that created the HDFS directories, use a hadoop command:

 $ bin/hdfs dfs -mkdir /user
 $ bin/hdfs dfs -mkdir /user/<username>

A few questions:

  • Where are these HDFS directories created and can they be deleted, if already exsist?
  • What's best practice for avoiding this?

When running a MR job, it will expect the ouput directory to be non-existent.

The first run of the job created it and the re-attempt of the job with same output path caused this exception.

And from your post, the output directory is provided as a relative path, in such case the directory will be created inside the user's HDFS directory ( /user/username/output ).

Yes, you can delete them if they already exist and you do not need them anymore.

hdfs dfs -rm -R output

To avoid this, You can either delete the directory and submit your job or provide a different non-existent path as output for the job.

Note: For example, if the provided output path is new/mapreduce/output , Hadoop expects the parent new/mapreduce/ to exist.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM