简体   繁体   中英

Installing Hbase / Hadoop on EC2 cluster

I know that I can spin off a EC2 cluster with Hadoop installed (unless I am wrong about that). How about Hbase ? Can I have the Hadoop and Hbase premade, ready to go? Or do I need to get my hands dirty. If it is not an option, what is the best option? Cloudera apparently has a package with both. Is that the way to go?

Thanks for the help.

hbase has a set of ec2 scripts which get you setup and ready to go very quickly. It lets you configure the number of zk servers as well as slave nodes, but I'm not sure in which versions they are available. I'm using 0.20.6. After setting up some of your S3/EC2 information, you can do things like:

/usr/local/hbase-0.20.6/contrib/ec2/bin/launch-hbase-cluster CLUSTERNAME SLAVES ZKSERVERS

to quickly start using the cluster. It's nice because it'll install LZO information for you, as well.

Here are some params from the environment file in the bin directory that might be useful (if you want a 20.6 AMI):

# The version of HBase to use.
HBASE_VERSION=0.20.6

# The version of Hadoop to use.
HADOOP_VERSION=0.20.2

# The Amazon S3 bucket where the HBase AMI is stored.
# Change this value only if you are creating your own (private) AMI
# so you can store it in a bucket you own.
#S3_BUCKET=apache-hbase-images
S3_BUCKET=720040977164

# Enable public access web interfaces
ENABLE_WEB_PORTS=false

# Extra packages
# Allows you to add a private Yum repo and pull packages from it as your
# instances boot up. Format is <repo-descriptor-URL> <pkg1> ... <pkgN>
# The repository descriptor will be fetched into /etc/yum/repos.d.
EXTRA_PACKAGES=

# Use only c1.xlarge unless you know what you are doing
MASTER_INSTANCE_TYPE=${MASTER_INSTANCE_TYPE:-c1.xlarge}

# Use only c1.xlarge unless you know what you are doing
SLAVE_INSTANCE_TYPE=${SLAVE_INSTANCE_TYPE:-c1.xlarge}

# Use only c1.medium unless you know what you are doing
ZOO_INSTANCE_TYPE=${ZOO_INSTANCE_TYPE:-c1.medium}

You also might need to set your java version if JAVA_HOME is not set in the ami (and I don't think it is). Newer versions of hbase are probably available in S3 buckets, just do a describe instances and grep for hadoop/hbase to narrow the results.

根据我的意见,在EC2上运行hbase的最简单,最快捷的方法是使用apache whirr

Are you aware of Amazon Elastic MapReduce? It doesn't offer Hbase but it offers plain 'ol Hadoop, Hive and Pig (in fairly recent versions). Big win is that they don't start charging you until 90% of your nodes are up, downside is that there is a slight premium per hour over normal EC2.

If you really need/want to use HBase then you may be better off spinning something up yourself. See the following Cloudera blog post for a discussion of Hive and Hbase integration: http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM