简体   繁体   English

使用 nltk.corpus 多线程

[英]Use nltk.corpus multithreaded

I would like to access nltk.corpus.wordnet in a multithreaded environment.我想在多线程环境中访问nltk.corpus.wordnet As soon as I enable multithreading, methods such as synsets() fail.一旦启用多线程,诸如synsets()失败。 If I disable it, everything works fine.如果我禁用它,一切正常。

The error messages change.错误消息发生变化。 For example, an error could look like this, which looks very much like a race condition to me:例如,一个错误可能看起来像这样,在我看来它非常像一个竞争条件:

File "/home/lhk/anaconda3/envs/dlab/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1342, in synset_from_pos_and_offset
    assert synset._offset == offset

There are other questions about this:关于这个还有其他问题:

The solution to the first linked question was to load the corpus before your program branches up into individual threads.第一个链接问题的解决方案是在您的程序分支到单个线程之前加载语料库。 I've done that: wordnet.ensure_loaded() is called before the multithreading.我已经这样做了: wordnet.ensure_loaded()在多线程之前被调用。

The recommendation in the GitHub issue is to import wordnet within my threaded function. GitHub 问题中的建议是在我的线程函数中导入 wordnet。 But that doesn't change anything.但这不会改变任何事情。

A workaround is to make a deep copy of the corpus, for every thread.解决方法是为每个线程制作语料库的深层副本。 Of course this needs lots of memory and is not very efficient:当然,这需要大量内存并且效率不高:

import copy
from nltk.corpus import wordnet as wn
wn.ensure_loaded()

# at the beginning of the multi-threaded environment
my_wn = copy.deepcopy(wn)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM