简体   繁体   English

Amazon EC2-网络问题

[英]Amazon EC2 - network issues

We are launching hadoop cluster on amazon ec2 and recently we are having network issues like master unable to connect to slave. 我们正在Amazon EC2上启动hadoop集群,最近我们遇到了网络问题,例如主机无​​法连接到从机。 We thought the reason is due to amazon throttling the network connections over a limit. 我们认为原因是由于亚马逊限制网络连接超过限制。 So, we tried to establish a connection after a random delay from each slave node. 因此,我们尝试在每个从属节点经过随机延迟后建立连接。 But, that didn't help. 但是,这没有帮助。

Are there any other suggestions? 还有其他建议吗?

Thank you Bala 谢谢你巴拉

Have you tried using the hadoop-ec2 scripts from cloudera? 您是否尝试过使用cloudera的hadoop-ec2脚本? I've been using them for setting up occasional hadoop clusters for my thesis research and I've found them to work quite well. 我一直在使用它们来为我的论文研究建立偶尔的hadoop集群,我发现它们可以很好地工作。 The setup takes a few minutes but once it's setup you just do 设置过程需要几分钟,但是一旦完成,您就可以

hadoop-ec2 launch-cluster <clustername> <number of slaves>

and it setups all the stuff you need, and usually does a really good job. 它会设置您需要的所有内容,通常做得很好。 Occasionally, a node won't startup or something, but it's easy enough to terminate the cluster and try again, and it doesn't cost too much. 有时候,节点不会启动或启动,但是终止集群然后重试很容易,而且成本也不高。

You can find the instructions for setting them up here: 您可以在此处找到设置它们的说明:

http://archive.cloudera.com/docs/ec2.html

Do you have the right ports open in the security group that your cluster instances use ? 您在群集实例使用的安全组中是否打开了正确的端口? I'm not familiar with Hadoop, but if it uses a custom TCP/IP or UDP port for communication between nodes, then you'll need to specify it in your security group. 我对Hadoop不熟悉,但是如果它使用自定义的TCP / IP或UDP端口在节点之间进行通信,则需要在安全组中指定它。

Using Amazon Elastic MapReduce would alleviate many issues and provide some IO boosts to S3 and between nodes as well as a few AWS specific patches to improve robustness. 使用Amazon Elastic MapReduce可以缓解许多问题,并为S3和节点之间提供一些IO增强功能,以及一些AWS特定补丁,以提高健壮性。

Its probably wise to stay away from the EC2 cluster scripts unless you need a specific version of Hadoop, but you really shouldn't. 除非需要特定版本的Hadoop,否则远离EC2群集脚本可能是明智的选择,但实际上不应该这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM