简体   繁体   中英

Unable to load large file to HDFS on Spark cluster master node

I have fired up a Spark Cluster on Amazon EC2 containing 1 master node and 2 servant nodes that have 2.7gb of memory each

However when I tried to put a file of 3 gb on to the HDFS through the code below

/root/ephemeral-hdfs/bin/hadoop fs -put /root/spark/2GB.bin 2GB.bin

it returns the error, "/user/root/2GB.bin could only be replicated to 0 nodes, instead of 1". fyi, I am able to upload files of smaller size but not when it exceeds a certain size (about 2.2 gb).

If the file exceeds the memory size of a node, wouldn't it will be split by Hadoop to the other node?

Edit: Summary of my understanding of the issue you are facing:

1) Total HDFS free size is 5.32 GB

2) HDFS free size on each node is 2.6GB

Note: You have bad blocks (4 Blocks with corrupt replicas)

The following Q&A mentions similar issues: Hadoop put command throws - could only be replicated to 0 nodes, instead of 1

In that case, running JPS showed that the datanode are down.

Those Q&A suggest a way to restart the data-node:

What is best way to start and stop hadoop ecosystem, with command line? Hadoop - Restart datanode and tasktracker

Please try to restart your data-node, and let us know if it solved the problem.


When using HDFS - you have one shared file system

ie all nodes share the same file system

From your description - the current free space on the HDFS is about 2.2GB , while you tries to put there 3GB.

Execute the following command to get the HDFS free size:

hdfs dfs -df -h

hdfs dfsadmin -report

or (for older versions of HDFS)

hadoop fs -df -h

hadoop dfsadmin -report

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM