简体   繁体   English

无法在Spark集群主节点上将大文件加载到HDFS

[英]Unable to load large file to HDFS on Spark cluster master node

I have fired up a Spark Cluster on Amazon EC2 containing 1 master node and 2 servant nodes that have 2.7gb of memory each 我已经在Amazon EC2上启动了一个Spark集群,其中包含1个主节点和2个具有2.7gb内存的服务方节点

However when I tried to put a file of 3 gb on to the HDFS through the code below 但是,当我尝试通过以下代码将3 GB的文件放到HDFS上时

/root/ephemeral-hdfs/bin/hadoop fs -put /root/spark/2GB.bin 2GB.bin

it returns the error, "/user/root/2GB.bin could only be replicated to 0 nodes, instead of 1". 它返回错误,“ / user / root / 2GB.bin只能复制到0个节点,而不是1个”。 fyi, I am able to upload files of smaller size but not when it exceeds a certain size (about 2.2 gb). 仅供参考,我可以上传较小尺寸的文件,但超过一定大小(约2.2 GB)时不能上传。

If the file exceeds the memory size of a node, wouldn't it will be split by Hadoop to the other node? 如果文件超过一个节点的内存大小,难道不是Hadoop会将其拆分到另一个节点吗?

Edit: Summary of my understanding of the issue you are facing: 编辑:我对您面临的问题的了解的摘要:

1) Total HDFS free size is 5.32 GB 1)HDFS可用总大小为5.32 GB

2) HDFS free size on each node is 2.6GB 2)每个节点上的HDFS可用大小为2.6GB

Note: You have bad blocks (4 Blocks with corrupt replicas) 注意:您有坏块(4个副本损坏的块)

The following Q&A mentions similar issues: Hadoop put command throws - could only be replicated to 0 nodes, instead of 1 以下问答提到了类似的问题: Hadoop put命令引发-只能复制到0个节点,而不是1个

In that case, running JPS showed that the datanode are down. 在这种情况下,运行JPS将显示datanode处于关闭状态。

Those Q&A suggest a way to restart the data-node: 这些问答提出了一种重新启动数据节点的方法:

What is best way to start and stop hadoop ecosystem, with command line? 使用命令行启动和停止hadoop生态系统的最佳方法是什么? Hadoop - Restart datanode and tasktracker Hadoop-重新启动datanode和tasktracker

Please try to restart your data-node, and let us know if it solved the problem. 请尝试重新启动您的数据节点,并让我们知道它是否解决了问题。


When using HDFS - you have one shared file system 使用HDFS时-您只有一个共享文件系统

ie all nodes share the same file system 即所有节点共享相同的文件系统

From your description - the current free space on the HDFS is about 2.2GB , while you tries to put there 3GB. 根据您的描述-HDFS上的当前可用空间约为2.2GB,而您尝试将其放置在其中3GB。

Execute the following command to get the HDFS free size: 执行以下命令以获取HDFS的可用大小:

hdfs dfs -df -h

hdfs dfsadmin -report

or (for older versions of HDFS) 或(对于较旧的HDFS版本)

hadoop fs -df -h

hadoop dfsadmin -report

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM