[英]NLTK - Modifying nested for loop for multiprocessing
Currently, I have a nested for-loop that amends a list.目前,我有一个修改列表的嵌套 for 循环。 I'm trying to create the same output while using multiprocessing.
我正在尝试在使用多处理时创建相同的 output。
My current code is,我目前的代码是,
for test in test_data:
output.append([((ngram[-1], ngram[:-1],model.score(ngram[-1], ngram[:-1])) for ngram in
test])
Where test_data is a generator object, and model.score is from the NLTK package.其中 test_data 是生成器 object,model.score 来自 NLTK package。
All the solutions I have found and tried, don't work (at least in my case).我找到并尝试过的所有解决方案都不起作用(至少在我的情况下)。
Is there a way to get the same output with multiprocessing?有没有办法通过多处理获得相同的 output?
When it comes to multiprocessing, I believe the simplest way to do it is by using joblib
package... To use this package all you need to do is to create a function that takes one item of the generator and returns the result of one item.当涉及到多处理时,我相信最简单的方法是使用
joblib
package... 要使用这个 package,您需要做的就是创建一个 ZC1C425268E68385D14AB5074C17A 的一个项目并返回4的生成器.
In your case, it will look like so:在你的情况下,它看起来像这样:
from joblib import Parallel, delayed
def func(test):
return [((ngram[-1], ngram[:-1], model.score(ngram[-1], ngram[:-1])) for ngram in test]
output = Parallel(n_jobs=4, backend="threading")(
delayed(func)(test) \
for test in test_data)
Now, output
is the output you are searching for.现在,
output
就是您要搜索的 output。 You can change the number of jobs as you like.您可以根据需要更改作业数量。 However, I recommend setting it to
multiprocessing.cpu_count()
which is 4
in my case.但是,我建议将其设置为
multiprocessing.cpu_count()
在我的情况下为4
。
You can also check the official documentation for more examples.您还可以查看官方文档以获取更多示例。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.