NLTK - 修改嵌套 for 循环以进行多处理

Question

Currently, I have a nested for-loop that amends a list.目前，我有一个修改列表的嵌套 for 循环。 I'm trying to create the same output while using multiprocessing.我正在尝试在使用多处理时创建相同的 output。

My current code is,我目前的代码是，

for test in test_data:
    output.append([((ngram[-1], ngram[:-1],model.score(ngram[-1], ngram[:-1])) for ngram in 
    test])

Where test_data is a generator object, and model.score is from the NLTK package.其中 test_data 是生成器 object，model.score 来自 NLTK package。

All the solutions I have found and tried, don't work (at least in my case).我找到并尝试过的所有解决方案都不起作用（至少在我的情况下）。

Is there a way to get the same output with multiprocessing?有没有办法通过多处理获得相同的 output？

Answer 1

When it comes to multiprocessing, I believe the simplest way to do it is by using joblib package... To use this package all you need to do is to create a function that takes one item of the generator and returns the result of one item.当涉及到多处理时，我相信最简单的方法是使用joblib package... 要使用这个 package，您需要做的就是创建一个 ZC1C425268E68385D14AB5074C17A 的一个项目并返回4的生成器.

In your case, it will look like so:在你的情况下，它看起来像这样：

from joblib import Parallel, delayed

def func(test):
    return [((ngram[-1], ngram[:-1], model.score(ngram[-1], ngram[:-1])) for ngram in test]


output = Parallel(n_jobs=4, backend="threading")(
            delayed(func)(test) \
                for test in test_data)

Now, output is the output you are searching for.现在， output就是您要搜索的 output。 You can change the number of jobs as you like.您可以根据需要更改作业数量。 However, I recommend setting it to multiprocessing.cpu_count() which is 4 in my case.但是，我建议将其设置为multiprocessing.cpu_count()在我的情况下为4 。

You can also check the official documentation for more examples.您还可以查看官方文档以获取更多示例。

NLTK - 修改嵌套 for 循环以进行多处理

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-19 05:37:04

NLTK - 修改嵌套 for 循环以进行多处理

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-19 05:37:04

解决方案1
1 已采纳 2019-11-19 05:37:04