简体   繁体   中英

How to get multiprocessing.Pool().starmap() to return iterable

Im trying to construct a dataframe from the inputs of a function as well as the output. Previously I was using for loops

for i in range(x):
    for j in range(y):
        k = func(i, j)
        (Place i, j, k into dataframe)

However the range was quite big so I tried to speed it up with multiprocessing.Pool()

with mp.Pool() as pool:
    result = pool.starmap(func, ((i, j) for j in range(y) for i in range(x))
    (Place result into dataframe)

However with pool I no longer have access to i and j as they are merely inputs into the function

I tried to get the function to return the inputs but that doesn't really make sense as the number of for loops increases, hence how to get the iterables passed into starmap?

Your starmap version and normal version are not equivalent. When using multiple loops in a generator expression, the outer loop comes first. So the call should rather be:

result = pool.starmap(func, ((i, j) for i in range(x) for j in range(y)))

Coming back to the question, like I mentioned in the comments, starmap returns the task results in the same order they were submitted. So considering that the only thing you unwanted to parallelize were the func calls, you can simply append all the results in one list, chunk it based on the value of y (the number of columns), and run another set of for loops outside the pool to get the value of i , j , and return value of func at the same time. Example:

import multiprocessing as mp


def func(i, j):
    return f"{i}{j}"


# https://stackoverflow.com/a/17483656/16310741
def chunks(l, n):
    return [l[i:i+n] for i in range(0, len(l), n)]


if __name__ == "__main__":
    x = 3
    y = 4

    with mp.Pool() as pool:
        # ['00', '01', '02', '03', '10', '11', '12', '13', '20', '21', '22', '23']
        result = pool.starmap(func, ((i, j) for i in range(x) for j in range(y)))

        # [['00', '01', '02', '03'], ['10', '11', '12', '13'], ['20', '21', '22', '23']]
        result = chunks(result, y)  

        for i in range(x):
            for j in range(y):
                curr_result = result[i][j]
                print(i, j, curr_result)
                # Do something with i, j, and curr_result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM