[英]How does communication between datanodes work in a Hadoop cluster?
I am new to Hadoop and help with this questions is appreciated. 我是Hadoop的新手,对此问题的帮助表示赞赏。
The replication of blocks in a cluster is handled by individual data nodes having a copy of the block, but how does this transfer take place without considering namenode. 群集中块的复制由具有块副本的各个数据节点处理,但是如何在不考虑namenode的情况下进行此传输。
I found that ssh is setup from slaves to master and master to slaves unlike slave to slave. 我发现ssh是从从设备到主设备和从设备到从设备,而不是从设备到从设备。
Could someone explain? 有人能解释一下吗
Is it through hadoop data transfer protocol like Client to DN communication ? 它是通过像客户端到DN通信的hadoop数据传输协议吗?
http://blog.cloudera.com/blog/2013/03/how-to-set-up-a-hadoop-cluster-with-network-encryption/ http://blog.cloudera.com/blog/2013/03/how-to-set-up-a-hadoop-cluster-with-network-encryption/
After digging into hadoop source code,I find datanodes use BlockSender class to transfer block data.Actually Socket is under the hood. 在深入研究hadoop源代码之后,我发现datanode使用BlockSender类来传输块数据。实际上Socket在幕后。
Below is my hack way to find this.(hadoop version 1.1.2 used here) 下面是我找到这个的黑客方法。(这里使用hadoop 1.1.2版)
codes above is datanode send heartbeat to namenode mainly to tell it is alive.the return value are some commands which datanode will process.this is where block copy happens. 上面的代码是datanode发送heartbeat到namenode主要是告诉它是alive.the返回值是datanode将处理的一些命令。这是块复制发生的地方。
here is a comment which we can be undoubtedly sure transferBlocks is what we want. 这是一个评论,毫无疑问我们可以肯定transferBlocks是我们想要的。
new Daemon(new DataTransfer(xferTargets, block, this)).start();
so,we know datanode start a new thread to do block copy. 所以,我们知道datanode开始一个新的线程来做块复制。
// send data & checksum blockSender.sendBlock(out, baseStream, null);
from code above, we can know BlockSender is the actual worker. 从上面的代码中,我们可以知道BlockSender是实际的工作者。
I have done my work,It is up to you to find more,such as BlockReader 我完成了我的工作,你可以找到更多,比如BlockReader
Whenever a block has to be written in HDFS, the NameNode will allocate space for this block on any datanode. 每当必须在HDFS中写入块时,NameNode将在任何datanode上为该块分配空间。 It will also allocate space on other datanodes for the replicas of this block. 它还将为此块的副本在其他datanode上分配空间。 Then it will instruct the first datanode to write the block and also to replicate the block on the other datanodes where space was allocated for the replicas. 然后,它将指示第一个datanode写入块,并在其他数据节点上复制块,其中为副本分配了空间。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.