简体繁体 English

名称节点删除后从HDFS还原文件

[英]Restore file from HDFS after namenode delete

原文 2017-06-05 15:34:04 5 1 hadoop/ hdfs/ recovery

My namenode server was hacked this weekend and /usr/local/hadoop directory no longer exists. 我的namenode服务器在本周末被黑，并且/ usr / local / hadoop目录不再存在。 Is it still possible to recover a file that is stored on HDFS? 是否仍然可以恢复HDFS上存储的文件？ Datanodes are accessible and each contains somewhere in the hierarchy blk_{...} data. 数据节点是可访问的，每个节点都包含在层次结构blk _ {...}数据中的某个位置。

1 个解决方案

If you don't have any copy/backup of the name dir, recovering the data will be quite a difficult task. 如果您没有名称dir的任何副本/备份，则恢复数据将是一项艰巨的任务。 The datanodes are not aware of any concept of a file, only blocks. 数据节点不知道文件的任何概念，仅块。 All of the data exists in those blocks but you would have to manually reconstruct files from their blocks. 所有数据都存在于这些块中，但是您必须从它们的块中手动重建文件。 If you have some specific files of very high importance and not that much data overall you may be able to sift through the blocks to find what you're looking for but I'm not aware of anything better than that. 如果您有一些非常重要的特定文件，而不是总体上没有那么多数据，您也许可以在块中进行筛选以找到所需的内容，但是我不知道有什么比这更好的了。

This is why there are a number of ways to redundantly store multiple copies of the namespace, eg by specifying multiple directories in the dfs.namenode.name.dir property, and using either a Secondary or a Standby Namenode (see https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Secondary_NameNode ), which act as a remote location storing a copy of the namespace. 这就是为什么有多种方法来冗余存储名称空间的多个副本的原因，例如，通过在dfs.namenode.name.dir属性中指定多个目录，并使用辅助或备用Namenode（请参阅https：// hadoop） .apache.org / docs / stable / hadoop-project-dist / hadoop-hdfs / HdfsUserGuide.html＃Secondary_NameNode ），它们充当存储命名空间副本的远程位置。