简体   繁体   中英

Hadoop and map-reduce on multicore machines

I have read a lot about Hadoop and Map-Reduce running on clusters of machines. Does some one know if the Apache distribution can be run on an SMP with several cores. In particular, can multiple Map-Reduce processes be run on the same machine. The scheduler will take care of spreading them across multiple cores. Thanks. - KG

Yes. You have multiple map and reduce slots in each machine which are determined by the RAM and CPU (each JVM instance needs 1GB by default so a 8GB machine with 16 cores should still have 7 task slots)

from hadoop wiki

Use the configuration knob: mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to control the number of maps/reduces spawned simultaneously on a TaskTracker. By default, it is set to 2, hence one sees a maximum of 2 maps and 2 reduces at a given instance on a TaskTracker.

You can set those on a per-tasktracker basis to accurately reflect your hardware (ie set those to higher nos. on a beefier tasktracker etc.).

You can use those lightweight MapReduce frameworks for multicore computers.

For example

LeoTask: A lightweight, productive, and reliable mapreduce framework for multicore computers

https://github.com/mleoking/LeoTask

For Apache Hadoop 2.7.3, my experience has been that enabling YARN will also enable multi-core support. Here is a simple guide for enabling YARN on a single node:

https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SingleCluster.html#YARN_on_a_Single_Node

The default configuration seems to work pretty well. If you want to tune your core usage, then perhaps look into setting 'yarn.scheduler.minimum-allocation-vcores' and 'yarn.scheduler.maximum-allocation-vcores' within yarn-site.xml ( https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml )

Also, see here for instructions on how to configure a simple Hadoop sandbox with multicore support: https://bitbucket.org/aperezrathke/hadoop-aee

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM