简体   繁体   中英

Amazon EC2 - network issues

We are launching hadoop cluster on amazon ec2 and recently we are having network issues like master unable to connect to slave. We thought the reason is due to amazon throttling the network connections over a limit. So, we tried to establish a connection after a random delay from each slave node. But, that didn't help.

Are there any other suggestions?

Thank you Bala

Have you tried using the hadoop-ec2 scripts from cloudera? I've been using them for setting up occasional hadoop clusters for my thesis research and I've found them to work quite well. The setup takes a few minutes but once it's setup you just do

hadoop-ec2 launch-cluster <clustername> <number of slaves>

and it setups all the stuff you need, and usually does a really good job. Occasionally, a node won't startup or something, but it's easy enough to terminate the cluster and try again, and it doesn't cost too much.

You can find the instructions for setting them up here:

http://archive.cloudera.com/docs/ec2.html

Do you have the right ports open in the security group that your cluster instances use ? I'm not familiar with Hadoop, but if it uses a custom TCP/IP or UDP port for communication between nodes, then you'll need to specify it in your security group.

Using Amazon Elastic MapReduce would alleviate many issues and provide some IO boosts to S3 and between nodes as well as a few AWS specific patches to improve robustness.

Its probably wise to stay away from the EC2 cluster scripts unless you need a specific version of Hadoop, but you really shouldn't.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM