Unable to load large file to HDFS on Spark cluster master node

Question

I have fired up a Spark Cluster on Amazon EC2 containing 1 master node and 2 servant nodes that have 2.7gb of memory each

However when I tried to put a file of 3 gb on to the HDFS through the code below

/root/ephemeral-hdfs/bin/hadoop fs -put /root/spark/2GB.bin 2GB.bin

it returns the error, "/user/root/2GB.bin could only be replicated to 0 nodes, instead of 1". fyi, I am able to upload files of smaller size but not when it exceeds a certain size (about 2.2 gb).

If the file exceeds the memory size of a node, wouldn't it will be split by Hadoop to the other node?

Answer 1

Edit: Summary of my understanding of the issue you are facing:

1) Total HDFS free size is 5.32 GB

2) HDFS free size on each node is 2.6GB

Note: You have bad blocks (4 Blocks with corrupt replicas)

The following Q&A mentions similar issues: Hadoop put command throws - could only be replicated to 0 nodes, instead of 1

In that case, running JPS showed that the datanode are down.

Those Q&A suggest a way to restart the data-node:

What is best way to start and stop hadoop ecosystem, with command line? Hadoop - Restart datanode and tasktracker

Please try to restart your data-node, and let us know if it solved the problem.

When using HDFS - you have one shared file system

ie all nodes share the same file system

From your description - the current free space on the HDFS is about 2.2GB , while you tries to put there 3GB.

Execute the following command to get the HDFS free size:

hdfs dfs -df -h

hdfs dfsadmin -report

or (for older versions of HDFS)

hadoop fs -df -h

hadoop dfsadmin -report

Unable to load large file to HDFS on Spark cluster master node

Question

1 answers

solution1
0 2016-04-03 10:48:59

Unable to load large file to HDFS on Spark cluster master node

Question

1 answers

solution1 0 2016-04-03 10:48:59

solution1
0 2016-04-03 10:48:59