简体   繁体   中英

Python: how to parallelize a loop with dictionary

EDITED: I have a code which looks like:

__author__ = 'feynman'

cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def MC_Surface(volume, mc_vol):

    Perm_area = {
        "00000000": 0.000000,
        "11111111": 0.000000,
         ...
         ...
        "11100010": 1.515500,
        "00011101": 1.515500
    }

    cdef int j, i, k
    for k in range(volume.shape[2] - 1):
        for j in range(volume.shape[1] - 1):
            for i in range(volume.shape[0] - 1):

                pattern = '%i%i%i%i%i%i%i%i' % (
                    volume[i, j, k],
                    volume[i, j + 1, k],
                    volume[i + 1, j, k],
                    volume[i + 1, j + 1, k],
                    volume[i, j, k + 1],
                    volume[i, j + 1, k + 1],
                    volume[i + 1, j, k + 1],
                    volume[i + 1, j + 1, k + 1])

                mc_vol[i, j, k] = Perm_area[pattern]

    return mc_vol

In the hope to speed up, it was modified to:

    {
      ...
      ...
      "11100010": 1.515500,
      "00011101": 1.515500
    }
    keys = np.array(Perm_area.keys())
    values = np.array(Perm_area.values())

    starttime = time.time()
    tmp_vol = GetPattern(volume)
    print 'time to populate the key array: ', time.time() - starttime

    cdef int i
    starttime=time.time()
    for i, this_key in enumerate(keys):
        mc_vol[tmp_vol == this_key] = values[i]

    print 'time for the loop: ', time.time() -starttime
    return mc_vol

def GetPattern(volume):
    a = (volume.astype(np.int)).astype(np.str)
    output = a.copy()  # Central voxel
    output[:, :-1, :] = np.char.add(output[:, :-1, :], a[:, 1:, :])  # East
    output[:-1, :, :] = np.char.add(output[:-1, :, :], a[1:, :, :])  # South
    output[:-1, :-1, :] = np.char.add(output[:-1, :-1, :], a[1:, 1:, :])  # SouthEast
    output[:, :, :-1] = np.char.add(output[:, :, :-1], a[:, :, 1:])  # Down
    output[:, :-1, :-1] = np.char.add(output[:, :-1, :-1], a[:, 1:, 1:])  # DownEast
    output[:-1, :, :-1] = np.char.add(output[:-1, :, :-1], a[1:, :, 1:])  # DownSouth
    output[:-1, :-1, :-1] = np.char.add(output[:-1, :-1, :-1], a[1:, 1:, 1:])  # DownSouthEast
    output = output[:-1, :-1, :-1]
    del a
    return output

This takes longer for a 3D array of size say 500^3. Here, tmp_vol 3D array of strings. For example: if say tmp_vol[0,0,0] = "00000000" then mc_vol[0,0,0] = 0.00000. Alternatively, I can get rid of mc_vol and write if tmp_vol[0,0,0] = "00000000" then tmp_vol[0,0,0] = 0.00000.

Here, The for-loop takes a lot of time, I see that only one CPU is used. I tried to map them in parallel using map and lambda but ran into errors. I am really new to python so any hints will be great.

Since I don't quite understand your code and you said "any hints will be great", I'll give you some general suggestions. Basically you want to speed up a for loop

for i, this_key in enumerate(keys):

What you could do is split the keys array into several parts, something like this:

length = len(keys)
part1 = keys[:length/3]
part2 = keys[length/3: 2*length/3]
part3 = keys[2*length/3:]

Then deal with each part in a subprocess:

from concurrent.futures import ProcessPoolExecutor

def do_work(keys):
    for i, this_key in enumerate(keys): 
        mc_vol[tmp_vol == this_key] = values[i]

with ProcessPoolExecutor(max_workers=3) as e:
    e.submit(do_work, part1)
    e.submit(do_work, part2)
    e.submit(do_work, part3)

return mc_vol

And that's it.

First, dictionary lookup takes closely constant time, although array equality check is O(N). Thus you should loop over your array and not your dictionary.

Second, you could save much time with nested list comprehension (change python loop to C loop).

mc_vol = [[Perm_area[key] for key in row] for row in tmp_vol]

This gives you a list of lists, so you can completely avoid numpy in this case. Although if you need a numpy array, just convert:

mc_vol = np.array(mc_vol)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM