简体繁体中英

Running hadoop jobs on Amazon ec2: multi node cluster

原文 2011-12-13 08:23:37 2 2 amazon-ec2/ hadoop/ mapreduce

I have to run hadoop mapreduce jobs on Amazon EC2 cluster.

I tried the setup using existing AMI's. But after starting master and clients "jps" doesn't list any nodes.

SO even after using the public hadoop AMI do we have to do the hadoop setup for masters and slaves? How will master know the IP address of slaves??

Can anyone please direct me to some good documents. I am banging my head on this for more than 12 hrs now.

Can anyone please help?

Thanks.

2 answers

Another alternative to what Matthew suggested, is the use of Whirr.

Whirr makes it really easy to deploy a Hadoop cluster on Amazon, and you don't have to pay for the mapreduce instaces. And you can control the version of the cluster.

Here's the project homepage: http://whirr.apache.org/

Here is the quickstart guide for installing Hadoop. It takes like 5 minutes for a running Hadoop cluster. http://whirr.apache.org/docs/0.6.0/quick-start-guide.html

I would use Amazon's Elastic MapReduce framework instead. You can dynamically spin up and down machines & clusters alike and you don't have to worry about configuring them to talk to each other.

http://aws.amazon.com/elasticmapreduce/

It's used by lots of people, and it's mostly reliable. It will save you an absolute TON of work normally spent setting up and administering a cluster. Just one thing is different to regular hadoop - it's best to put things in S3 instead of HDFS (as the clusters are transient, so the HDFS data disappears with the cluster).

Running CRON jobs on Amazon ec2

hadoop examples not running on amazon ec2

hadoop namenode not running on 2 node cluster of two ec2 instances:Error FSNamesystem initialization failed

Running mahout using hadoop on Amazon's EMR/EC2

Do i need Java for running hadoop in Amazon EC2?

How to restart single node hadoop cluster on ec2

is it necessary to use a load balancer for an Elasticsearch cluster running on Amazon EC2?

Datastax - Cassandra Amazon EC2 Multiregion Setup - Cluster with 3 node

Problems with two-node cassandra cluster setup on Amazon EC2

Amazon EC2 cluster setup

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Running CRON jobs on Amazon ec2 hadoop examples not running on amazon ec2 hadoop namenode not running on 2 node cluster of two ec2 instances:Error FSNamesystem initialization failed Running mahout using hadoop on Amazon's EMR/EC2 Do i need Java for running hadoop in Amazon EC2? How to restart single node hadoop cluster on ec2 is it necessary to use a load balancer for an Elasticsearch cluster running on Amazon EC2? Datastax - Cassandra Amazon EC2 Multiregion Setup - Cluster with 3 node Problems with two-node cassandra cluster setup on Amazon EC2 Amazon EC2 cluster setup

Related Tags

Running hadoop jobs on Amazon ec2: multi node cluster

Question

2 answers

solution1
2 2011-12-13 18:41:21

solution2
1 2011-12-13 17:33:06

Running hadoop jobs on Amazon ec2: multi node cluster

Question

2 answers

solution1 2 2011-12-13 18:41:21

solution2 1 2011-12-13 17:33:06

solution1
2 2011-12-13 18:41:21

solution2
1 2011-12-13 17:33:06