简体   繁体   中英

How to delete intermediate output file from Hdfs

I am trying to delete intermediate output directory of mapreduce program using

FileUtils.deleteDirectory(new File(tempFiles));

but this command doesn't delete directories from hdfs.

Map reduce does not write intermediate results on hdfs ,it writes on local disk.

Whenever mapper produce output it first goes on memory buffer where partitioning and sorting takes place when buffer exceeds its default capacity it spill those results into local disk .

Summary is output produced by mapper goes into local file system .

Only in one condition mapper will write their output to hdfs if specifically it has been set in the driver class not to use any reducer.

In above case there would be final output we won't say its intermediate.

You are using the wrong API boy ! You should be using apache FileUtil instead FileUtils . The later one is used for file manipulation in local filesystems.

https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html#fullyDelete http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html

I understand that one can easily pick the wrong one due to the similar names. Your current code is looking into your local file system to delete that path without any effect on the HDFS.

Sample code :

FileUtil.fullyDelete(new File("pathToDir"));

On the other hand, you can make use of FileSystem api itself which has a method delete . You need to get the FileSystem object though. eg:

filesystem.delete(new Path("pathToDir"), true); 

The second argument is the recursive flag.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM