简体   繁体   中英

Python Parallel Processing Syntax for Functions with Multiple Arguments

Hi I'm currently trying to run a function parallely on multiple cores due to the long run time of the program. I could not find the syntax for multiprocessing syntax for functions with multiple arguments. I have attached my code below and have no idea how to fix the syntax.

import pandas as pd
import numpy as np
import numpy.random as rdm
import matplotlib.pyplot as plt
import math as m
import random as r
import time
from joblib import Parallel, delayed

def firstLoop(r1, r2, d):
    count = 0
    for i in range(r1):
        for j in range(r2):
            if(findDistance(dat1[i, 0], dat2[j, 0], dat1[i, 1], dat2[j, 1]) <= d):
                count = count + 1
    return count

food1 = range(r1)
atm1 = range(r2)
d = 100
num_cores = multiprocessing.cpu_count()
results = Parallel(n_jobs=num_cores)(delayed(firstLoop)(for i in food1, for j in atm1, d))

I'm trying to run firstLoop on multiple cores with all the elements in food1 and atm1, but am unsure on the syntax for the program.

EDIT:

startTime = time.time()
with mp.Pool(processes = mp.cpu_count()) as p:
    p.starmap(f, [(x, y, d) for x in range(r1) for y in range(r2) for d in range(100, 200, 100)])
print(time.time()- startTime)

An easy syntax to do multiprocessing with function having severals parameters is using Pool() and starmap()

import multiprocessing as mp

def f(x, y):
    return x+y

with mp.Pool(processes = mp.cpu_count()) as p:
    p.starmap(f, [(x, y) for x in range(10) for y in range(10)])

To do so, the function f must perform an elementary step.

In your code:

def firstLoop(r1, r2, d):
    count = 0
    for i in range(r1):
        for j in range(r2):
            if(findDistance(dat1[i, 0], dat2[j, 0], dat1[i, 1], dat2[j, 1]) <= d):
                count = count + 1
    return count

The elementary step is findDistance(dat1[i, 0], dat2[j, 0], dat1[i, 1], dat2[j, 1]) <= d) . A possible function f could be:

def f(i, j, d):
    if findDistance(dat1[i, 0], dat2[j, 0], dat1[i, 1], dat2[j, 1]) <= d):
        return 1
    else:
        return 0

Then we initialize the result array and the variables:

r1 = 
r2 = 
d = 100
results = np.zeros(shape=(1,r1*r2))

# Compute
with mp.Pool() as p:
    results = p.starmap(f, [(i, j, d) for i in range(r1) for j in range(r2)])

Then to count you simply need to sum the array. I didn't test the code, the idea is here, but some adaptation might be needed especially in the way to return the result from the map function and to store them. I'm not using this syntax often since I'm mainly storing the results in a file at the end of the function multiprocessing.

Maybe the creation of results is not actually needed beforehand. Not sure.

# to use function with many arguments: # you have to use a wrapper with one argument (a list of arguments) # and you call the wapper into pool.map(...) with this list of list of arguments # you can call in wrapper function with optional parameter def hello_wrapper(arguments): return hello(arguments[0], arguments[1], age=arguments[2], country=arguments[3] debug=arguments[4]) def hello(name, firstname, age=0, country='France', debug=False): return 'I am', name, firstname, 'I am ', age, 'old and I live into' + country if __name__ == '__main__': inputs=[] # in real code feed with a loop inputs.append(['SMITH', 'John', 18, 'USA', True]) inputs.append(['COOL', 'Bob', 14, 'USA', True]) inputs.append(['SMITH', 'John', 18, 'USA', True]) ... pool = multiprocessing.Pool pool = multiprocessing.Pool(processes=10) outputs = pool.map(parallele_wrapper, inputs) # if you want to do this in a class hello_wrapper(...) and hello(...) must be staticmethode

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM