How to apply a map operation in batches?

Question

I am using an function that applies object to a list of strings. However, it takes a lot of time to finish, because after checking the object's library website the authors say that you need to apply it by chunks in order to not overload the memory. I am applying the function as follows:

list_1 =['hi how are you', 'i am good', ..., 'how is']
results = list(
    map(lambda string_list_elem: foo(string_list_elem, library_obj), list_1))

The above is taking too much time. What is the best way to speed up the function application? So far, I tried to split the list in chunks like this:

import itertools

def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))

list(split_seq(list_1, 500))

However, I do not know if this will work. Should I do a list comprehension or just use this function and split? What is the recommended way for accelerating the results_list process?

Answer 1

Since you can't show/share the crucial function/worker foo() I can't recognize all potential bottlenecks to be solved with different optimization techniques.
On this phase I would suggest to start with concurrent/asynchronous approach through concurrent.futures.ThreadPoolExecutor :

from concurrent.futures import ThreadPoolExecutor
import functools

def foo(string_list_elem, library_obj):
    ....

str_list = ['hi how are you', 'i am good', ..., 'how is']

with ThreadPoolExecutor() as executor:
    # replace `<your_lib>` with your actual library_obj
    results = list(executor.map(functools.partial(foo, library_obj=<your_lib>), str_list))
    print(results)

It'll speed up your processing significantly.

How to apply a map operation in batches?

Question

1 answers

solution1
1 ACCPTED 2019-07-19 16:25:13

How to apply a map operation in batches?

Question

1 answers

solution1 1 ACCPTED 2019-07-19 16:25:13

solution1
1 ACCPTED 2019-07-19 16:25:13