简体   繁体   中英

parallel/multithread differential evolution in python

I'm trying to model a biochemical process, and I structured my question as an optimization problem, that I solve using differential_evolution from scipy.
So far, so good, I'm pretty happy with the implementation of a simplified model with 15-19 parameters.
I expanded the model and now, with 32 parameters, is taking way too long. Not totally unexpected, but still an issue, hence the question.

I've seen:
- an almost identical question for R Parallel differential evolution
- and a github issue https://github.com/scipy/scipy/issues/4864 on the topic

but it would like to stay in python (the model is within a python pipeline), and the pull request did not lead to and officially accepted solution yet, although some options have been suggested.

Also, I can't parallelize the code within the function to be optimised because is a series of sequential calculations each requiring the result of the previous step. The ideal option would be to have something that evaluates some individuals in parallel and return them to the population.

Summing up:
- Is there any option within scipy that allows parallelization of differential_evolution that I dumbly overlooked? (Ideal solution)
- Is there a suggestion for an alternative algorithm in scipy that is either (way) faster in serial or possible to parallelize?
- Is there any other good package that offers parallelized differential evolution funtions? Or other applicable optimization methods?
- Sanity check: am I overloading DE with 32 parameter and I need to radically change approach?

PS
I'm a biologist, formal math/statistics isn't really my strenght, any formula-to-english translation would be hugely appreciated :)

PPS
As an extreme option I could try to migrate to R, but I can't code C/C++ or other languages.

Scipy differential_evolution can now be used in parallel extremely easily, by specifying the workers:

workers int or map-like callable, optional

If workers is an int the population is subdivided into workers sections and evaluated in parallel (uses multiprocessing.Pool). Supply -1 to use all available CPU cores. Alternatively supply a map-like callable, such as multiprocessing.Pool.map for evaluating the population in parallel. This evaluation is carried out as workers(func, iterable). This option will override the updating keyword to updating='deferred' if workers != 1. Requires that func be pickleable.

New in version 1.2.0.

scipy.optimize.differential_evolution documentation

Thanks to @jp2011 for pointing to pygmo

First, worth noting the difference from pygmo 1, since the fist link on google still directs to the older version.

Second, Multiprocessing island are available only for python 3.4+

Third, it works. The processes I started when I first asked the question are still running while I write, the pygmo archipelago running an extensive test of all the 18 possible DE variations present in saDE made in less than 3h. The compiled version using Numba as suggested here https://esa.github.io/pagmo2/docs/python/tutorials/coding_udp_simple.html will probably finish even earlier. Chapeau.

I personally find it a bit less intuitive than the scipy version, given the need to build a new class (vs a signle function in scipy) to define the problem but is probably just a personal preference. Also, the mutation/crossing over parameters are defined less clearly, for someone approaching DE for the first time might be a bit obscure.
But, since serial DE in scipy just isn't cutting it, welcome pygmo(2).

Additionally I found a couple other options claiming to parallelize DE. I didn't test them myself, but might be useful to someone stumbling on this question.

Platypus, focused on multiobjective evolutionary algorithms https://github.com/Project-Platypus/Platypus

Yabox
https://github.com/pablormier/yabox

from Yabox creator a detailed, yet IMHO crystal clear, explaination of DE https://pablormier.github.io/2017/09/05/a-tutorial-on-differential-evolution-with-python/

I've been having exactly the same problem. Perhaps, you could try pygmo , which does support different optimisation algorithms (including DE) and has a model for parallel computation. However, I'm finding that the community is not big as it is for scipy. Their tutorials, documentation, and examples are good quality and one can get things to work from that.

I suggest the batch mode of PyFDE. https://pythonhosted.org/PyFDE/tutorial.html#batch-mode In batch mode, the fitness function will be called only once per iteration to evaluate the fitness of all the population.

The example w/o the batch mode:

import pyfde
from math import cos, pi
import time
import numpy

t1=time.time()
def fitness(p):
    x, y = p[0], p[1]
    val = 20 + (x**2 - 10*cos(2*pi*x)) + (y**2 - 10*cos(2*pi*y))
    return -val
    
solver = pyfde.ClassicDE(fitness, n_dim=2, n_pop=40, limits=(-5.12, 5.12))
solver.cr, solver.f = 0.9, 0.45
best, fit = solver.run(n_it=150)
t2=time.time()
print("Estimates: ",best)
print("Normal mode elapsed time (s): ",t2-t1)

The batch mode example:

t1=time.time()
def vec_fitness(p,fit):
    x, y = numpy.array(p[:,0]), numpy.array(p[:,1])
    val = 20 + (x**2 - 10*numpy.cos(2*pi*x)) + (y**2 - 10*numpy.cos(2*pi*y))
    fit[:] = -val
    
solver = pyfde.ClassicDE(vec_fitness, n_dim=2, n_pop=40, limits=(-5.12, 5.12), batch=True)
solver.cr, solver.f = 0.9, 0.45
best, fit = solver.run(n_it=150)
t2=time.time()
print("Estimates: ",best)
print("Batch mode elapsed time (s): ",t2-t1)

The output is:

Estimates: [1.31380987e-09 1.12832169e-09]
Normal mode elapsed time (s): 0.015959978103637695

Estimates: [2.01733383e-10 1.23826873e-10]
Batch mode elapsed time (s): 0.006017446517944336

############################################################

It's 1.5x faster, but only for a simple question. You can see >10x faster for a complex question. The code runs on a single CPU core (no multi-processing), and the performance improvement comes from the use of vectorization and MIMD (multiple instruction, multiple data). Combining vectorization and parallel/multi-processing will result in a double-improvement.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM