Parallelization Python loop

Question

I'm a bit lost between joblib , multiprocessing , etc..

What's the most effective way to parallelize a for loop, based on your experience?

For example:

for i, p in enumerate(patches[ss_idx]):
            bar.update(i+1)
            h_features.append(calc_haralick(p))

def calc_haralick(roi):

    feature_vec = []

    texture_features = mt.features.haralick(roi)
    mean_ht = texture_features.mean(axis=0)

    [feature_vec.append(i) for i in mean_ht[0:9]]

    return np.array(feature_vec)

It gets i patches of images then extract features via haralick

And this is how I get patches

 h_neigh = 11 # haralick neighbourhood
 size = h_neigh
 shape = (img.shape[0] - size + 1, img.shape[1] - size + 1, size, size)
 strides = 2 * img.strides
 patches = stride_tricks.as_strided(img, shape=shape, strides=strides)
 patches = patches.reshape(-1, size, size)

Sorry if any information is superfluous

Answer 1

Your images appear to be simple two-dimensional NumPy arrays, and patches a list or array of those. I assume ss_idx is an index array (ie, not an integer), so that patches[ss_idx] remains something that can be iterated over (as in your example).

In that case, simply use multiprocessing.Pool.map :

import multiprocessing as mp

nproc = 10
with mp.Pool(nproc) as pool:
    h_features = pool.map(calc_haralick, patches[ss_idx])

See the first basic example in the multiprocessing documentation .

If you leave out nproc or set it to None , all available cores will be used.

The potential problem with multiprocessing is, that it will create nproc identical Python processes, and copy all the relevant data to those processes. If your images are large, this will cause considerable overhead.

In such a case, it may be worth to split your Python program in separate programs, where calculating the future of a single image is one independent program. That program would need to handle reading a single image and writing the features. You'd then wrap everything in eg a bash script that loops over all images, taking care to use only a certain amount of cores at the same (eg, background processes, but wait every 10 images). The next step/program requires reading the independent feature files into a multi-dimensional array, but from there, you can continue your old program.

While this is more work, it may save some copying overhead (though it introduces extra I/O overhead, in particular writing the separate feature files).
It also has the optional advantage that this is fairly easy to run distributed, should the possibility ever occur.

Try multiprocessing, keeping an eye out on memory usage and CPU usage (if nothing happens for a long time, it may be copying overhead). Then, try another method.

Parallelization Python loop

Question

1 answers

solution1
1 2020-05-27 13:41:46

Parallelization Python loop

Question

1 answers

solution1 1 2020-05-27 13:41:46

solution1
1 2020-05-27 13:41:46