简体   繁体   中英

Python: using map and multiprocessing

I'm trying to write a function that can take two arguments and then add it to multiprocessing.Pool and parallelize it. I had some complications when I tried to write this simple function.

df = pd.DataFrame()
df['ind'] = [111, 222, 333, 444, 555, 666, 777, 888]
df['ind1'] = [111, 444, 222, 555, 777, 333, 666, 777]

def mult(elem1, elem2):
    return elem1 * elem2

if __name__ == '__main__':
    pool = Pool(processes=4) 
    print(pool.map(mult, df.ind.astype(int).values.tolist(), df.ind1.astype(int).values.tolist()))
    pool.terminate()

It's returning an error:

TypeError: unsupported operand type(s) for //: 'int' and 'list'

I can't understand what's wrong. Can anybody explain what this error means and how I can fix it?

The multi-process Pool module takes in a list of the arguments that you want to multi-process, and only supports taking in one argument. You can fix this by doing the following:

from multiprocessing import Pool
import pandas as pd

df = pd.DataFrame()
df['ind'] = [111, 222, 333, 444, 555, 666, 777, 888]
df['ind1'] = [111, 444, 222, 555, 777, 333, 666, 777]

def mult(elements):
    elem1,elem2 = elements
    return elem1 * elem2

if __name__ == '__main__':
    pool = Pool(processes=4)
    inputs = zip(df.ind.astype(int).values.tolist(), df.ind1.astype(int).values.tolist())
    print(pool.map(mult, inputs))
    pool.terminate()

What I've done here is zip your two iterables into a list with each element being the two arguments that you wanted to input. Now, I change the input of your function to unpack those arguments so that they can be processed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM