After reading a large table from DB2 (some tables have 100 million) in chunks, I convert the generator object to iterator using itertools.islice. I pass the iterator to multiprocessing pool.map that calls a function to extract these chunks to CSV parallely.
It works but before the parallel run starts, python pool.map converts the ITERATOR to a LIST that consumes lot of time. Is there a way I can avoid this list creation or convert into list faster? I also tried using POOL.IMAP but my notebook kernel dies when I run the program. To use IMAP, I will have to convert the iterator to a list which takes time again. Any thoughts?
generator_df = pd.read_sql(query2, test_connection_forbankcv_connection, chunksize = 5000)
iterable_slice = list(it.islice(generator_df, slice_start,slice_end))
results = p.imap(chunk_to_csv, iterable_slice, 1)
I'll admit right off the bat, this solution has a few problems, but it shows the basic idea:
import itertools
from typing import Iterable
from multiprocessing import Pool
class Lengthed_ISlice:
def __init__(self, iterable: Iterable, start: int, stop: int):
self._start = start
self._stop = stop
self._islice = itertools.islice(iterable, self._start, self._stop)
def __len__(self):
return self._stop - self._start
def __iter__(self):
return iter(self._islice)
This is a thin wrapper over an islice
object that implements the required __len__
method so that it will work with Pool
's map
method:
def double(n):
return n * 2
my_list = list(range(10, 100))
with Pool() as p:
print(p.map(double, Lengthed_ISlice(my_list, 2, 9)))
# Prints [24, 26, 28, 30, 32, 34, 36]
Main issues:
islice
except for __iter__
. If you get errors about missing methods when/if you expand your usage of this, you'll need to implement the proper methods.Iterable
's generic parameter. If you want better type hinting, you should introduce a TypeVar
for the constructor argument and for __iter__
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.