简体   繁体   中英

When copying a file to HDFS, how to control what nodes that file will reside on?

I'm dealing with kind of a bizarre use case where I need to make sure that File A is local to Machine A, File B is local to Machine B, etc. When copying a file to HDFS, is there a way to control which machines that file will reside on? I know that any given file will be replicated across three machines, but I need to be able to say "File A will DEFINITELY exist on Machine A". I don't really care about the other two machines -- they could be any machines on my cluster.

Thank you.

我不这么认为,因为通常当文件大于64MB(块大小)时,文件块的主要副本将驻留在多个服务器上。

HDFS is a distributed files system and HDFS is cluster (one machine or lots of machine) specific and once file is at HDFS you loose the machine or machines concept underneath. And that abstraction is what makes it best use case. If file size is bigger then replication block size the file will be cut into block size and based on replication factor, those blocks will be copied to other machine in your cluster. Those blocks move based on

In your case, if you have 3 nodes cluster (+1 main namenode), your source file size is 1 MB, your replication size is 64MB, and replication factor is 3, then you will have 3 copies of blocks in all 3 nodes consisting your 1MB file however from HDFS perspective you will still have only 1 file. Once file copies to HDFS, you really dont consider the machine factor because at machine level there is no file, it is file blocks.

If you really want to make sure for whatever reason, you can do is set the replication factor to 1 and have 1 node cluster which will guarantee your bizarre requirement.

Finally you can always use FSimage viewer tools in your Hadoop cluster to see where the file blocks are located. More details are located here .

我最近发现了这可能可以解决您要执行的操作: 控制HDFS块放置

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM