I would like to use multiprocessing in python with generator functions
Let's say I have a massive list of lists big_list
, and I would like to use multiprocessing to compute values. If I use "traditional" functions which return values, this is straightforward:
import concurrent
def compute_function(list_of_lists):
return_values = [] ## empty list
for list in list_of_lists:
new_value = compute_something(list) ## compute something; just an example
return_values.append(new_value) ## append to list
return return_values
with concurrent.futures.ProcessPoolExecutor(max_workers=N) as executor:
new_list = list(executor.map(compute_function, big_list))
However, using lists in this manner is too memory intensive. So I would like to use generator functions instead:
import concurrent
def generator_function(list_of_lists):
for list in list_of_lists:
new_value = compute_something(list) ## compute something; just an example
yield new_value
with concurrent.futures.ProcessPoolExecutor(max_workers=N) as executor:
new_list = list(executor.map(generator_function, big_list))
My problem is, you cannot pickle generators. There are some workarounds to this problem for other data structures, but not for generators I think.
How could I accomplish this?
generator are just a fancy loop that preserve the state, it similar to the iterator logic, it provide you with a next
, hasNext
and similar api, so your loop will ask that iterator for the next item (as long as it has next item)
the implantation of the generator is completely up to the developer, it can be implemented by
for i in [1,2,3,4]
for line in file
range(100)
all have a common requirement, where the generator need to keep it's current state so it will know what to yield
in the next state, thus makes it very much stateful which in turn makes it very bad choice to use in multi-processing...
you can approach this problem with a map-reduce similar logic and split the whole list to small sublists, pass those to the workers and join all their output to the final result
You can do your enumeration one level deeper in big_list
using itertools.chain.from_iterable
to iterate the sublists.
import concurrent
import itertools
def compute_function(item):
return compute_something(item)
with concurrent.futures.ProcessPoolExecutor(max_workers=N) as executor:
for result in executor.map(compute_function,
itertools.chain.from_iterable(big_list)):
print(result)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.