I am using an function that applies object to a list of strings. However, it takes a lot of time to finish, because after checking the object's library website the authors say that you need to apply it by chunks in order to not overload the memory. I am applying the function as follows:
list_1 =['hi how are you', 'i am good', ..., 'how is']
results = list(
map(lambda string_list_elem: foo(string_list_elem, library_obj), list_1))
The above is taking too much time. What is the best way to speed up the function application? So far, I tried to split the list in chunks like this:
import itertools
def split_seq(iterable, size):
it = iter(iterable)
item = list(itertools.islice(it, size))
while item:
yield item
item = list(itertools.islice(it, size))
list(split_seq(list_1, 500))
However, I do not know if this will work. Should I do a list comprehension or just use this function and split? What is the recommended way for accelerating the results_list process?
Since you can't show/share the crucial function/worker foo()
I can't recognize all potential bottlenecks to be solved with different optimization techniques.
On this phase I would suggest to start with concurrent/asynchronous approach through concurrent.futures.ThreadPoolExecutor
:
from concurrent.futures import ThreadPoolExecutor
import functools
def foo(string_list_elem, library_obj):
....
str_list = ['hi how are you', 'i am good', ..., 'how is']
with ThreadPoolExecutor() as executor:
# replace `<your_lib>` with your actual library_obj
results = list(executor.map(functools.partial(foo, library_obj=<your_lib>), str_list))
print(results)
It'll speed up your processing significantly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.