简体   繁体   中英

NLTK - Modifying nested for loop for multiprocessing

Currently, I have a nested for-loop that amends a list. I'm trying to create the same output while using multiprocessing.

My current code is,

for test in test_data:
    output.append([((ngram[-1], ngram[:-1],model.score(ngram[-1], ngram[:-1])) for ngram in 
    test])

Where test_data is a generator object, and model.score is from the NLTK package.

All the solutions I have found and tried, don't work (at least in my case).

Is there a way to get the same output with multiprocessing?

When it comes to multiprocessing, I believe the simplest way to do it is by using joblib package... To use this package all you need to do is to create a function that takes one item of the generator and returns the result of one item.

In your case, it will look like so:

from joblib import Parallel, delayed

def func(test):
    return [((ngram[-1], ngram[:-1], model.score(ngram[-1], ngram[:-1])) for ngram in test]


output = Parallel(n_jobs=4, backend="threading")(
            delayed(func)(test) \
                for test in test_data)

Now, output is the output you are searching for. You can change the number of jobs as you like. However, I recommend setting it to multiprocessing.cpu_count() which is 4 in my case.

You can also check the official documentation for more examples.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM