Python 3: Parallel diagonalization of multiple matrices

Question

I am trying to improve the performance of some code of mine, that first constructs a 4x4 matrix depending on two indices, diagonalizes this matrix and then stores the eigenvectors of each diagonalization of each matrix in an 4-dimensional array. At the moment I am just going through all the indices serially and then store the eigenvectors in its place in the 4-dimensional array. Now, I am wondering if it is possible to parallelize this a little bit by using threading or something similar such that each thread would diagonalize one matrix and then store it in its place. The problem I have is, what are my limitations in doing this? Would I run into problems when different threads want to write into the resulting 4-dim. array at the same time and do I have to use a lock in order to prevent this? I am sorry if this question is trivial, but by searching I was not able to find anything related and my knowledge about threading is very limited. A minimal example would be

from numpy.linalg import eigh as eigh2
from scipy import *

spectrum = zeros([L//2,L//2,4,4],complex)
for i in range(0,L//2):
    for j in range(0,L//2):
        k = [-(2 * i*2*pi/L),-(2 * j*2*pi/L)]
        H = ones([4,4],complex)
        energies, states = eigh2(H)
        spectrum[i,j,:,:] = states

Note that I have exchanged the function that constructs the matrix in dependence of k for some constant matrix for sake of brevity.

I would really appreciate any help or pointers to resources how I could implement some parallelizations. Is threading a realistic way of improving the performance?

Answer 1

The short answer is that yes, you probably need locks—but if you can reorganize your problem, that may be a lot better than locking.

The long answer is a bit involved, especially since I don't know how much you already know.

In general, threading doesn't do much good in CPython for CPU-bound code, because of the Global Interpreter Lock , which prevents any threads from interpreting a line (actually, bytecode) of Python if another thread is in the middle of doing so. However, NumPy has code that specifically releases the GIL in certain places to allow threading to work better, so if you're CPU-bound within low-level NumPy algorithms , threading actually can work. The docs are not always clear about which functions do this and which don't, so you may have to test it yourself just to find out if parallelizing will help here. (A quick&dirty way to do this is to hack up a version of your code that just does the computations without storing them anywhere, run it across N threads, and see how many cores are busy while you do it.)

Now, in general, in CPython, locks aren't necessary around certain kinds of operations, including __setitem__ on simple types—but that's because of that same GIL, so it isn't going to help you here. If you have multiple operations all trying to write to the same array, they will need a lock around that array.

But there may be a better way around this. If you can find a way to divide the array into smaller arrays, only one of which is being modified at any given time, you don't need any locks. Or, if you can have the threads return smaller arrays that can be assembled by a single master thread into the final answer, instead of working in-place in the first place, that also works.

But before you go doing that… in some cases, NumPy (or, rather, one of the libraries it's using) is already auto-parallelizing things for you, or could be if you built it differently. Or it could be SIMD-vectorizing things in a way that actually gives more speedup than threading, which you could end up breaking. And so on.

So, make sure you have a properly-optimized NumPy with all the optional prereqs installed before you try anything. Then make sure it's only using one core as-is. Then build a test scaffolding so you can compare different implementations. And then you can try out each lock-based, non-sharing, and non-mutating algorithm you can come up with to see if the parallelism helps more than the extra stuff hurts.

Python 3: Parallel diagonalization of multiple matrices

Question

1 answers

solution1
2 2014-10-10 22:44:03

Python 3: Parallel diagonalization of multiple matrices

Question

1 answers

solution1 2 2014-10-10 22:44:03

solution1
2 2014-10-10 22:44:03