简体繁体 English

HDFS Hadoop中的数据分发

[英]Distribution of Data in HDFS Hadoop

原文 2014-08-26 13:14:52 2 2 hadoop/ hdfs

I configured 3 datanodes on my Linux machine. 我在Linux机器上配置了3个datanode。 In my configuration, I configured the number of replication to be 1. 在我的配置中，我将复制数量配置为1。

I am submitting a file to the hdfs, and found that the file has 3 copies on each datanodes (I checked it from the browser) 我正在向hdfs提交文件，发现该文件在每个datanode上都有3个副本（我从浏览器中对其进行了检查）

Isn't right that I should only see the file on 1 datanodes and on 1 replica? 我应该只在1个数据节点和1个副本上看到文件吗？

2 个解决方案

Before going into the HDFS the file will be split into blocks and you should see one replica of each block on each datanode. 在进入HDFS之前，该文件将被拆分为多个块，您应该在每个数据节点上看到每个块的一个副本。 The file as a whole won't be present on any of the datanode. 整个文件不会出现在任何datanode上。

Please make sure that you have restarted HDFS daemons after changing the replication factor property in the hdfs-site.xml file. 在更改hdfs-site.xml文件中的复制因子属性后，请确保已重新启动HDFS守护程序。

Also It would be good if you can post your HDFS Console Snapshot. 另外，如果您可以发布HDFS控制台快照，那将是很好的。

I suspect that dfs.replication is set to 3 instead of 1 我怀疑dfs.replication设置为3而不是1

Make sure that below parameters are set to 1 in your hdfs-site.xml 确保在hdfs-site.xml中将以下参数设置为1

dfs.replication : Default block replication. dfs.replication ：默认块复制。 The actual number of replications can be specified when the file is created. 创建文件时可以指定实际的复制数量。 The default is used if replication is not specified in create time 如果在创建时未指定复制，则使用默认值

dfs.namenode.replication.min : Minimal block replication. dfs.namenode.replication.min ：最小块复制。

Have a look at documentation for more details. 请查看文档以了解更多详细信息。