简体   繁体   中英

Program using multiprocessing spends most of its time on thread lock?

I'm using multiprocessing.Pool to parallelize parts of a program I'm running. I'm looping over data, calculating something and then returning the result.

Poorly performing code:

def likelihood_data(self, data):
    func = partial(likelihood, means=self.means, stddevs=self.stddevs, c_ks=self.c_k)
    if len(data) > 100:
        pool = Pool(10)
        try:
            likelihoods = pool.map(func, data)
        finally:
            pool.close()
            pool.join()
    else:
        likelihoods = []
        for sample in data:
            likelihoods.append(self.likelihood(sample))
    return np.mean(likelihoods)

def likelihood(sample, means, stddevs, c_ks):  # is outside of class
    likel = [] 
    for c_k, m, s in zip(c_ks, means, stddevs):
        likel.append(likel_bound(np.log(c_k) + np.sum(logg(sample, m, s))))
    return np.sum(np.exp(likel))   

From using cProfile, the poor performence comes from the majority of the time being spent on {method 'acquire' of '_thread.lock' objects} . I do not understand why that could happen when each process is independent of each other. What's going on here?

edit: Or is it just taking the longest because it's waiting for all the processes to finish?

My mistake was I was using multiprocessing on too small of an amount of data. As I was calling likelihood_data many times, all the time was spent starting and stopping the multiprocessing without any actual gain.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM