简体   繁体   English

NLTK - 修改嵌套 for 循环以进行多处理

[英]NLTK - Modifying nested for loop for multiprocessing

Currently, I have a nested for-loop that amends a list.目前,我有一个修改列表的嵌套 for 循环。 I'm trying to create the same output while using multiprocessing.我正在尝试在使用多处理时创建相同的 output。

My current code is,我目前的代码是,

for test in test_data:
    output.append([((ngram[-1], ngram[:-1],model.score(ngram[-1], ngram[:-1])) for ngram in 
    test])

Where test_data is a generator object, and model.score is from the NLTK package.其中 test_data 是生成器 object,model.score 来自 NLTK package。

All the solutions I have found and tried, don't work (at least in my case).我找到并尝试过的所有解决方案都不起作用(至少在我的情况下)。

Is there a way to get the same output with multiprocessing?有没有办法通过多处理获得相同的 output?

When it comes to multiprocessing, I believe the simplest way to do it is by using joblib package... To use this package all you need to do is to create a function that takes one item of the generator and returns the result of one item.当涉及到多处理时,我相信最简单的方法是使用joblib package... 要使用这个 package,您需要做的就是创建一个 ZC1C425268E68385D14AB5074C17A 的一个项目并返回4的生成器.

In your case, it will look like so:在你的情况下,它看起来像这样:

from joblib import Parallel, delayed

def func(test):
    return [((ngram[-1], ngram[:-1], model.score(ngram[-1], ngram[:-1])) for ngram in test]


output = Parallel(n_jobs=4, backend="threading")(
            delayed(func)(test) \
                for test in test_data)

Now, output is the output you are searching for.现在, output就是您要搜索的 output。 You can change the number of jobs as you like.您可以根据需要更改作业数量。 However, I recommend setting it to multiprocessing.cpu_count() which is 4 in my case.但是,我建议将其设置为multiprocessing.cpu_count()在我的情况下为4

You can also check the official documentation for more examples.您还可以查看官方文档以获取更多示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM