简体   繁体   中英

Python Multiprocessing - Can we pass an (itertools.islice) iterable directly to pool.imap whithout converting to a list?

After reading a large table from DB2 (some tables have 100 million) in chunks, I convert the generator object to iterator using itertools.islice. I pass the iterator to multiprocessing pool.map that calls a function to extract these chunks to CSV parallely.

It works but before the parallel run starts, python pool.map converts the ITERATOR to a LIST that consumes lot of time. Is there a way I can avoid this list creation or convert into list faster? I also tried using POOL.IMAP but my notebook kernel dies when I run the program. To use IMAP, I will have to convert the iterator to a list which takes time again. Any thoughts?

generator_df = pd.read_sql(query2, test_connection_forbankcv_connection, chunksize = 5000)
iterable_slice = list(it.islice(generator_df, slice_start,slice_end))
results = p.imap(chunk_to_csv, iterable_slice, 1) 

I'll admit right off the bat, this solution has a few problems, but it shows the basic idea:

import itertools
from typing import Iterable
from multiprocessing import Pool

class Lengthed_ISlice:
    def __init__(self, iterable: Iterable, start: int, stop: int):
        self._start = start
        self._stop = stop
        self._islice = itertools.islice(iterable, self._start, self._stop)

    def __len__(self):
        return self._stop - self._start

    def __iter__(self):
        return iter(self._islice)

This is a thin wrapper over an islice object that implements the required __len__ method so that it will work with Pool 's map method:

def double(n):
    return n * 2

my_list = list(range(10, 100))

with Pool() as p:
    print(p.map(double, Lengthed_ISlice(my_list, 2, 9)))
    # Prints [24, 26, 28, 30, 32, 34, 36]

Main issues:

  • It doesn't properly delegate any functionality to the underlying islice except for __iter__ . If you get errors about missing methods when/if you expand your usage of this, you'll need to implement the proper methods.
  • For brevity, I didn't bother with steps since you aren't using non-default steps, and they complicate the math a tiny bit.
  • I didn't worry about making use of Iterable 's generic parameter. If you want better type hinting, you should introduce a TypeVar for the constructor argument and for __iter__ .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM