简体   繁体   中英

Data placement and distribution in HDFS for heterogeneous Hadoop cluster

I have installed Apache Hadoop 2.x with 5 heterogeneous nodes among which one node is purely dedicated to NameNode.

I am using below command to put my input files into HDFS .

$ hdfs dfs -put /home/hduser/myspace/data /user/hduser/inputfile

HDFS replicates this input file on three DataNodes (DN) , it means one 4th DataNode is not having input block. If I use 8 mappers (by setting the split size using NLineInputFormat() method), then will these 8 mappers be assigned to all 4 DNs. I think it should be. In that case data block from other DNs will move to 4th DN to be computed by mappers assigned to it, which increase the overall execution time.

My questions are:

  1. Can we somehow manage to place data blocks on each DNs so that there is no need to move data for mappers on a particular DN. Can it accomplish by "put" command of hdfs?

  2. Also in case of heterogeneous clusters, can we put different size of data on different DNs depending on nodes's computing power?

We cannot manage to place the data blocks on each DN.You mentioned HDFS replicates file to 3 DNs. And this is true only if your file size is less than the block size. HDFS replicates data by dividing a file into multiple blocks. So, there is a greater probability that the file data(blocks) are spread across all the 4 DNs.

The block placement totally depends on hadoop and it will manage the blocks placement internally you can only configure the number of replication by

dfs.replication.factor

or size by

dfs.block.size

of block to accomplish what you desire.

If you want to check the block placement you can open the Web UI of HDFS which is

Namenode:50070

and browse to the file here it will show you the blocks placement among all the nodes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM