简体   繁体   中英

How to parallelize a nested for loop in python?

Ok, here is my problem: I have a nested for loop in my program which runs on a single core. Since the program spend over 99% of run time in this nested for loop I would like to parallelize it. Right now I have to wait 9 days for the computation to finish. I tried to implement a parallel for loop by using the multiprocessing library. But I only find very basic examples and can not transfer them to my problem. Here are the nested loops with random data:

import numpy as np

dist_n = 100
nrm = np.linspace(1,10,dist_n)

data_Y = 11000
data_I = 90000
I = np.random.randn(data_I, 1000)
Y = np.random.randn(data_Y, 1000)
dist = np.zeros((data_I, dist_n)

for t in range(data_Y):
    for i in range(data_I):
        d = np.abs(I[i] - Y[t])
        for p in range(dist_n):
            dist[i,p] = np.sum(d**nrm[p])/nrm[p]

    print(dist)

Please give me some advise how to make it parallel.

There's a small overhead with initiating a process (50ms+ depending on data size) so it's generally best to MP the largest block of code possible. From your comment it sounds like each loop of t is independent so we should be free to parallelize this.

When python creates a new process you get a copy of the main process so you have available all your global data but when each process writes the data, it writes to it's own local copy. This means dist[i,p] won't be available to the main process unless you explicitly pass it back with a return (which will have some overhead). In your situation, if each process writes dist[i,p] to a file then you should be fine, just don't try to write to the same file unless you implement some type of mutex access control.

#!/usr/bin/python
import time
import multiprocessing as mp
import numpy as np

data_Y = 11 #11000
data_I = 90 #90000
dist_n = 100
nrm = np.linspace(1,10,dist_n)
I = np.random.randn(data_I, 1000)
Y = np.random.randn(data_Y, 1000)
dist = np.zeros((data_I, dist_n))

def worker(t):
    st = time.time()
    for i in range(data_I):
        d = np.abs(I[i] - Y[t])
        for p in range(dist_n):
            dist[i,p] = np.sum(d**nrm[p])/nrm[p]
    # Here - each worker opens a different file and writes to it
    print 'Worker time %4.3f mS' % (1000.*(time.time()-st))


if 1:   # single threaded
    st = time.time()
    for x in map(worker, range(data_Y)):
        pass
    print 'Single-process total time is %4.3f seconds' % (time.time()-st)
    print

if 1:   # multi-threaded
    pool = mp.Pool(28) # try 2X num procs and inc/dec until cpu maxed
    st = time.time()
    for x in pool.imap_unordered(worker, range(data_Y)):
        pass
    print 'Multiprocess total time is %4.3f seconds' % (time.time()-st)
    print

If you re-increase the size of data_Y/data_I again, the speed-up should increase up to the theoretical limit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM