简体   繁体   English

如何从 HDFS 中删除文件?

[英]How to delete files from the HDFS?

I just downloaded Hortonworks sandbox VM, inside it there are Hadoop with the version 2.7.1.我刚刚下载了 Hortonworks 沙盒 VM,里面有 2.7.1 版的 Hadoop。 I adding some files by using the我通过使用添加一些文件

hadoop fs -put /hw1/* /hw1

...command. ...命令。 After it I am deleting the added files, by the之后我删除添加的文件,由

hadoop fs -rm /hw1/*

...command, and after it cleaning the recycle bin, by the ...命令,并在它清理回收站后,由

hadoop fs -expunge

...command. ...命令。 But the DFS Remaining space not changed after recyle bin cleaned.但是清理回收站后DFS剩余空间没有改变。 Even I can see that the data was truly deleted from the /hw1/ and the recyle bin.甚至我都能看到数据真的从/hw1/和recyle bin中删除了。 I have the fs.trash.interval parameter = 1 .我有fs.trash.interval parameter = 1

Actually I can find all my data split in chunks in the /hadoop/hdfs/data/current/BP-2048114545-10.0.2.15-1445949559569/current/finalized/subdir0/subdir2 folder, and this is really surprises me, because I expect them to be deleted.实际上我可以在/hadoop/hdfs/data/current/BP-2048114545-10.0.2.15-1445949559569/current/finalized/subdir0/subdir2文件夹中找到我所有的数据分块,这真的让我感到惊讶,因为我期望他们被删除。

So my question how to delete the data the way that they really will be deleted?所以我的问题是如何以真正将被删除的方式删除数据? After few adding and deletion I got exhausted free space.经过几次添加和删除后,我用尽了可用空间。

Try hadoop fs -rm -R URI试试hadoop fs -rm -R URI

-R option deletes the directory and any content under it recursively. -R 选项以递归方式删除目录及其下的任何内容。

Your problem is inside of the basis of HDFS.您的问题在 HDFS 的基础之内。 In HDFS (and in many other file systems) physical deleting of files isn't the fastest operations.在 HDFS(以及许多其他文件系统)中,物理删除文件并不是最快的操作。 As HDFS is distributed file system and usually replicate at least 3 replicas on different servers of the deleted file then each replica (which may consist of many blocks on different hard drives) must be deleted in the background after your request to delete the file.由于 HDFS 是分布式文件系统,并且通常在不同服务器上复制至少 3 个已删除文件的副本,因此在您请求删除文件后,必须在后台删除每个副本(可能由不同硬盘驱动器上的许多块组成)。

Official documentation of Hadoop tells us the following: Hadoop 的官方文档告诉我们以下内容:

The deletion of a file causes the blocks associated with the file to be freed.删除文件会导致与文件关联的块被释放。 Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.请注意,用户删除文件的时间与 HDFS 中可用空间相应增加的时间之间可能存在明显的时间延迟

You can use您可以使用

hdfs dfs -rm -R /path/to/HDFS/file

since hadoop dfs has been deprecated.因为hadoop dfs已被弃用。

什么对我有用:

hadoop fs -rmr -R <your Directory>

如果您还需要跳过垃圾以下命令对我有用

hdfs dfs -rm -R -skipTrash /path/to/HDFS/file

Durga Viswanath Gadiraju is right it is question of time, maybe my PC is slow, and also uses VM, after 10 minutes files are physically deleted, if you are using the algorythm that used by me in the question. Durga Viswanath Gadiraju 是对的,这是时间问题,如果您使用的是我在问题中使用的算法,则可能我的 PC 很慢,并且还使用 VM,10 分钟后文件被物理删除。 Note set up the fs.trash.interval parameter = 1. Or by default files won't be deleted faster than 6 hours.注意设置 fs.trash.interval 参数 = 1。否则默认文件不会被删除超过 6 小时。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM