简体   繁体   中英

Hadoop Processing time in clustered and standalone system

I have set up a 3 node hadoop cluster (1 Namenode, 2 data nodes) and hbase on top of the same hdfs. Each node are 512 MB Ubuntu Virtual box images running on my windows 8 Machine(Intel i5,4GB RAM, 2.4Ghz)
I have configured hbase-hadoop based on this blog http://ankitasblogger.blogspot.in/2011/01/hadoop-cluster-setup.html

I have written a program, which analyzes US Census Data which is approximately has 500,000 records(reduced set). I am just reading the file(from hdfs) in MAP task and storing it is HBASE . and later retrieving data based on a filter.

When I run the program in a stand alone(512 MB Virtual Machine) hadoop-hbase, it takes around 23 minutes. But when I run the same jar in the cluster(512*3 MB) it takes upwards of 40 minutes.

Why is the cluster taking more time to process? or is it a expected result ?

running a cluster in virtual-machines will only slow down your map-reduce (because of the overhead from running the virtual-os and multiple hadoop instances) especially if you run out of memory and it has to use the swap from the host os.

keep in mind that the virtual-machines all share 1 physical CPU and should only be used for development.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM