简体   繁体   中英

Expanding HDFS memory in Cloudera

I need to expand my hdfs memory from 50 GB to 200GB in cloudera. I am using a vm with 300 GB free spage but the hdfs is only configured to use 50 GB on hdfs. My dfs.namenode.name.dir is pointed to the default dfs/nn

<name>dfs.namenode.name.dir</name>
    <value>file:///dfs/nn</value>

And my hdfs dfsadming -report gives me:

[root@localhost conf.cloudera.hdfs] hdfs dfsadmin -report
Configured Capacity: 55531445863 (51.72 GB)
Present Capacity: 6482358272 (6.04 GB)
DFS Remaining: 3668803584 (3.42 GB)
DFS Used: 2813554688 (2.62 GB)
DFS Used%: 43.40%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

What should I do ? Is there an upper limit to the folder capacity on Redhat (like say 50GB per folder created )? Should I add a new folder to dfs.namenode.name.dir and each folder will add 50GB to the hdfs usage ?

From resources below - it seems you need to check dfs.datanode.du.reserved settings for each node and use formula below to check if disk space is utilize correctly...

As per the property "dfs.datanode.du.reserved", it was configured to use 4.25 GB and hence I consider now that 4.25 GB is allocated for each data directory in a given node. Since I had two data directory partitions, the reserved space combined would be 8.5 GB per node and which brings the configured capacity on each node to be 23.5 GB (32GB - 8.5GB). I arrived at the following formula === > Configured Capacity = Total Disk Space allocated for Data Directories (dfs.data.dir) - Reserved Space for Non DFS Use (dfs.datanode.du.reserved)

Configured-Capacity-quot-shows-less-size-than-the-original

what-exactly-non-dfs-used-means

Update: also see...

dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold - Only used when the dfs.datanode.fsdataset.volume.choosing.policy is set to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy . - This setting controls how much DN volumes are allowed to differ in terms of bytes of free disk space before they are considered imbalanced. If the free space of all the volumes are within this range of each other, the volumes will be considered balanced and block assignments will be done on a pure round robin basis.

dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction - Only used when the dfs.datanode.fsdataset.volume.choosing.policy is set to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy . - This setting controls what percentage of new block allocations will be sent to volumes with more available disk space than others. This setting should be in the range 0.0 - 1.0, though in practice 0.5 - 1.0, since there should be no reason to prefer that volumes with less available disk space receive more block allocations

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM