简体   繁体   English

Hadoop文件系统大小du命令

[英]Hadoop filesystem size du command

I want to know what the two outputs of hadoop fs -du means. 我想知道hadoop fs -du的两个输出是什么意思。 It's not clear on the documentation: 在文档上不清楚:

In [16]: subprocess.call(["hadoop", "fs", "-du","-
h","/project/crm/warehouse/"])

Output: 输出:

5.9 G 17.8 G /project/crm/warehouse/n98770_patron_1 5.9 G 17.8 G /项目/ crm /仓库/ n98770_patron_1

What's the real size of the path? 路径的实际大小是多少? 5.9 GB or 17.8? 5.9 GB或17.8?

Thank you 谢谢

The first column is the actual file or directory size, while the second one is the real space consumed due to replication 第一列是实际文件或目录的大小,而第二列是由于复制而消耗的实际空间

Since HDFS replicates your data, the second field is showing how much total disk space takes up after it. 由于HDFS复制了您的数据,因此第二个字段显示了在其之后占用的总磁盘空间。

In this case your total size is 17.8 and the basic size is 5.9 在这种情况下,您的总大小为17.8,基本大小为5.9

17.8/5.9 is roughly 3 17.8 / 5.9大约是3

This means your hdfs cluster has a replication factor of 3 (is the default value). 这意味着您的hdfs群集的复制因子为3(默认值)。

If your replication factor were 2, then the output will be: 如果您的复制因子是2,那么输出将是:

5.9 G 12 G /project/crm/warehouse/n98770_patron_1 5.9 G 12 G /项目/ crm /仓库/ n98770_patron_1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM