[英]Use nltk.corpus multithreaded
I would like to access nltk.corpus.wordnet
in a multithreaded environment.我想在多线程环境中访问
nltk.corpus.wordnet
。 As soon as I enable multithreading, methods such as synsets()
fail.一旦启用多线程,诸如
synsets()
失败。 If I disable it, everything works fine.如果我禁用它,一切正常。
The error messages change.错误消息发生变化。 For example, an error could look like this, which looks very much like a race condition to me:
例如,一个错误可能看起来像这样,在我看来它非常像一个竞争条件:
File "/home/lhk/anaconda3/envs/dlab/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1342, in synset_from_pos_and_offset
assert synset._offset == offset
There are other questions about this:关于这个还有其他问题:
The problem here was also caused by multithreading: What would cause WordNetCorpusReader to have no attribute LazyCorpusLoader?这里的问题也是多线程造成的: 什么会导致WordNetCorpusReader没有属性LazyCorpusLoader?
This question has a more general title but seems to describe the same problem (multithreaded corpus loading fails): Python NLTK multi threading这个问题有一个更笼统的标题,但似乎描述了同样的问题(多线程语料库加载失败): Python NLTK multi threading
There is an issue about this: https://github.com/nltk/nltk/issues/1576有一个关于这个的问题: https : //github.com/nltk/nltk/issues/1576
The solution to the first linked question was to load the corpus before your program branches up into individual threads.第一个链接问题的解决方案是在您的程序分支到单个线程之前加载语料库。 I've done that:
wordnet.ensure_loaded()
is called before the multithreading.我已经这样做了:
wordnet.ensure_loaded()
在多线程之前被调用。
The recommendation in the GitHub issue is to import wordnet within my threaded function. GitHub 问题中的建议是在我的线程函数中导入 wordnet。 But that doesn't change anything.
但这不会改变任何事情。
A workaround is to make a deep copy of the corpus, for every thread.解决方法是为每个线程制作语料库的深层副本。 Of course this needs lots of memory and is not very efficient:
当然,这需要大量内存并且效率不高:
import copy
from nltk.corpus import wordnet as wn
wn.ensure_loaded()
# at the beginning of the multi-threaded environment
my_wn = copy.deepcopy(wn)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.