简体   繁体   English

如何从HDFS删除中间输出文件

[英]How to delete intermediate output file from Hdfs

I am trying to delete intermediate output directory of mapreduce program using 我正在尝试使用以下方法删除mapreduce程序的中间输出目录

FileUtils.deleteDirectory(new File(tempFiles));

but this command doesn't delete directories from hdfs. 但是此命令不会从hdfs中删除目录。

Map reduce does not write intermediate results on hdfs ,it writes on local disk. Map reduce不会在hdfs上写入中间结果,而是在本地磁盘上写入。

Whenever mapper produce output it first goes on memory buffer where partitioning and sorting takes place when buffer exceeds its default capacity it spill those results into local disk . 每当映射器产生输出时,它首先进入内存缓冲区,当缓冲区超过其默认容量时,就会进行分区和排序,它将这些结果溢出到本地磁盘中。

Summary is output produced by mapper goes into local file system . 映射器产生的摘要输出进入本地文件系统。

Only in one condition mapper will write their output to hdfs if specifically it has been set in the driver class not to use any reducer. 如果在驱动程序类中明确设置了不使用任何减速器,则只有在一种情况下,映射器才会将其输出写入hdfs。

In above case there would be final output we won't say its intermediate. 在上述情况下,将有最终输出,我们将不说其中间值。

You are using the wrong API boy ! 您使用的API男孩错误! You should be using apache FileUtil instead FileUtils . 您应该使用apache FileUtil而不是FileUtils The later one is used for file manipulation in local filesystems. 后者用于本地文件系统中的文件操作。

https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html#fullyDelete http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/FileUtils.html https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileUtil.html#fully 删除http://commons.apache.org/proper/commons-io/apidocs/org/apache /commons/io/FileUtils.html

I understand that one can easily pick the wrong one due to the similar names. 我了解,由于名称相似,很容易选择错误的商品。 Your current code is looking into your local file system to delete that path without any effect on the HDFS. 您当前的代码正在调查本地文件系统以删除该路径,而不会影响HDFS。

Sample code : 样例代码:

FileUtil.fullyDelete(new File("pathToDir"));

On the other hand, you can make use of FileSystem api itself which has a method delete . 另一方面,您可以使用FileSystem api本身,该方法本身具有delete方法。 You need to get the FileSystem object though. 但是,您需要获取FileSystem对象。 eg: 例如:

filesystem.delete(new Path("pathToDir"), true); 

The second argument is the recursive flag. 第二个参数是递归标志。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM