简体   繁体   中英

Running hadoop jobs on Amazon ec2: multi node cluster

I have to run hadoop mapreduce jobs on Amazon EC2 cluster.

I tried the setup using existing AMI's. But after starting master and clients "jps" doesn't list any nodes.

SO even after using the public hadoop AMI do we have to do the hadoop setup for masters and slaves? How will master know the IP address of slaves??

Can anyone please direct me to some good documents. I am banging my head on this for more than 12 hrs now.

Can anyone please help?

Thanks.

Another alternative to what Matthew suggested, is the use of Whirr.

Whirr makes it really easy to deploy a Hadoop cluster on Amazon, and you don't have to pay for the mapreduce instaces. And you can control the version of the cluster.

Here's the project homepage: http://whirr.apache.org/

Here is the quickstart guide for installing Hadoop. It takes like 5 minutes for a running Hadoop cluster. http://whirr.apache.org/docs/0.6.0/quick-start-guide.html

I would use Amazon's Elastic MapReduce framework instead. You can dynamically spin up and down machines & clusters alike and you don't have to worry about configuring them to talk to each other.

http://aws.amazon.com/elasticmapreduce/

It's used by lots of people, and it's mostly reliable. It will save you an absolute TON of work normally spent setting up and administering a cluster. Just one thing is different to regular hadoop - it's best to put things in S3 instead of HDFS (as the clusters are transient, so the HDFS data disappears with the cluster).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM