简体   繁体   中英

How can load balancing be handled in Hadoop mapreduce?

How can load balancing be handled in Hadoop mapreduce? I am writing a distributed application in which the server distributes jobs to worker nodes based on a benchmark test, memory available, number of CPU cores, CPU usage, number of GPUs available / usage? I am not very experienced with mapreduce and have read some documentation on apache's website but am still not sure how to go about and solve this problem. Can I do the benchmark calculation and get this all of this information and then by an algorithm to dynamically split up the input?

Thank you!

"MapReduce is a programming model and an associated implementation for processing and generating large data sets" extract of the abstract of MapReduce paper.

As you said it in comments, it seems your project is not data intensive but computing intensive, thus I think MapReduce is not the tool you need to use.

Performance of MapReduce systems strongly depends on an even data distribution. Apache MapReduce frameworks use a simplistic approach to distribute the work load and assign the same number of clusters to each reducer.

The load imbalance, which raises the processing time, is even amplified by the high runtime complexities of the reducer tasks. An adaptive load balancing strategy is required to address the problem of estimating the cost of the tasks that are distributed to the reducers based on a given cost model.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM