I have read a lot about Hadoop and Map-Reduce running on clusters of machines. Does some one know if the Apache distribution can be run on an SMP with several cores. In particular, can multiple Map-Reduce processes be run on the same machine. The scheduler will take care of spreading them across multiple cores. Thanks. - KG
Yes. You have multiple map and reduce slots in each machine which are determined by the RAM and CPU (each JVM instance needs 1GB by default so a 8GB machine with 16 cores should still have 7 task slots)
from hadoop wiki
Use the configuration knob: mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to control the number of maps/reduces spawned simultaneously on a TaskTracker. By default, it is set to 2, hence one sees a maximum of 2 maps and 2 reduces at a given instance on a TaskTracker.
You can set those on a per-tasktracker basis to accurately reflect your hardware (ie set those to higher nos. on a beefier tasktracker etc.).
You can use those lightweight MapReduce frameworks for multicore computers.
For example
LeoTask: A lightweight, productive, and reliable mapreduce framework for multicore computers
For Apache Hadoop 2.7.3, my experience has been that enabling YARN will also enable multi-core support. Here is a simple guide for enabling YARN on a single node:
The default configuration seems to work pretty well. If you want to tune your core usage, then perhaps look into setting 'yarn.scheduler.minimum-allocation-vcores' and 'yarn.scheduler.maximum-allocation-vcores' within yarn-site.xml ( https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml )
Also, see here for instructions on how to configure a simple Hadoop sandbox with multicore support: https://bitbucket.org/aperezrathke/hadoop-aee
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.