Easiest way to run simple loops (that work on different data) over multiple cpu cores in python?

Question

I am computing some data for every year that is relatively computationally intensive. I have used numba (to great effect) to reduce the time taken to run iterations to compute the data. However given I have 20 years of independent data, I would like to split them into 5 x groups of 4 that could run over 4 different cpu cores.

def compute_matrices(self):
    for year in self.years:
         self.xs[year].compute_matrix()

In the above code-snippet, the function is a method within a Class that contains attributes year and xs. year is simply an integer year, and xs is a cross-section object that houses the xs.data and the compute_matrix() method.

What is the easiest way to split this across multiple cores?

It would be great if there were a Numba style decorater that could automatically break up the loops and run them over different processes and glue the results together. Does this exist?
Is my best bet using Python.multiprocessing?

Answer 1

So there are a couple of things you could look at for this:

NumbaPro: https://store.continuum.io/cshop/accelerate/ . This is basically Numba on steroids, providing support for many- and multicore architectures. Unfortunately it is not cheap.

Numexpr: https://code.google.com/p/numexpr/ . This is an expression evaluator for numpy arrays that implements hyperthreading.

Numexpr-Numba (experimental): https://github.com/gdementen/numexpr-numba . As the name suggests this is Numexpr using a Numba backend.

A lot of the answer will depend on what is done in your compute_matrix method.

The fastest (in terms of development time) solution would probably be to just split your computations using the multiprocessing library. It should be noted that it will be easier to use multiprocessing if your compute_matrix function has no side effects.

Answer 2

The easiest method I have come across for complex objects is to leverage the IPython Parallel Computing Engine.

Simply get an Ipython Cluster running using: ipcluster start -n 4 or use the notebook

Then you can iterate over xs objects assigned to different clients.

def multicore_compute_matrices(self):
    from IPython.parallel import Client
    c = Client()
    xs_list = []
    years = sorted(self.years)
    # - Ordered List of xs Objects - #
    for year in years
         xs_list.append(self.xs[year])
    # - Compute across Clusters - #
    results = c[:].map_sync(lambda x: x.compute_matrix(), xs_list)
    # - Assign Results to Current Object - #
    year = years[0]
    for result in results:
        self.xs[year].matrix = result
        year += 1

Wall Time %time results:

%time A.compute_matrices()
Wall Time: 5.53s

%time A.multicore_compute_matrices():
Wall Time: 2.58s

Easiest way to run simple loops (that work on different data) over multiple cpu cores in python?

Question

2 answers

solution1
3 ACCPTED 2014-04-05 09:10:29

solution2
1 2014-04-07 02:10:23

Easiest way to run simple loops (that work on different data) over multiple cpu cores in python?

Question

2 answers

solution1 3 ACCPTED 2014-04-05 09:10:29

solution2 1 2014-04-07 02:10:23

solution1
3 ACCPTED 2014-04-05 09:10:29

solution2
1 2014-04-07 02:10:23