简体   繁体   English

LDA槌多处理冻结

[英]LDA Mallet Multiprocessing Freezing

So I am trying to run LDA mallet on a dataset.所以我试图在数据集上运行 LDA mallet。 It takes in lemma tokens and a bunch of texts which is our dataset.它接受引理标记和一堆文本,这是我们的数据集。 The issue is when we run, a freeze message pops up and all of our old methods that have already ran start running again.问题是当我们运行时,会弹出一条冻结消息,并且我们所有已经运行的旧方法再次开始运行。 It says its due to the multiprocessing starting before the other finished.它说这是由于多处理在另一个完成之前开始。 Not sure how to fix.不知道如何修复。 This is ran on MacOS.这是在 MacOS 上运行的。 Code and output are below.代码和输出如下。

import gensim
from gensim.models.coherencemodel import CoherenceModel
from gensim.corpora import Dictionary
from gensim.models.ldamodel import LdaModel
import os.path

def optimize_parameters(lemma_tokens, texts):
    
    os.environ['MALLET_HOME'] = '****/mallet-2.0.8'
    mallet_path = '****/mallet-2.0.8/bin/mallet'

    id2word = Dictionary(lemma_tokens)

    # Filtering Extremes
    id2word.filter_extremes(no_below=2, no_above=.99)

    # Creating a corpus object 
    corpus = [id2word.doc2bow(d) for d in lemma_tokens]

    model = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=5, id2word=id2word, workers = 4)
    coherencemodel = CoherenceModel(model=model, texts=lemma_tokens, dictionary=id2word, coherence='c_v')
    coherence = coherencemodel.get_coherence()

The "****" is the rest of the path that can't be shown due to privacy. “****”是路径的其余部分,由于隐私原因无法显示。

The error output:错误输出:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
<10> LL/token: -6.83952
<20> LL/token: -6.70949

I figure it out.我想通了。 You have to put the entire script in您必须将整个脚本放入

if __name__ == '__main__':
  imports
  code

Found solution via an old google chat.通过旧的谷歌聊天找到解决方案。 Posted link:https://groups.google.com/g/gensim/c/-gMNdkujR48/m/i4Dn1_bjBQAJ发布链接:https ://groups.google.com/g/gensim/c/-gMNdkujR48/m/i4Dn1_bjBQAJ

To summarize what is happening, due to multiprocessing, the other bits of code are run multiple times instead of the once it is supposed to.总结一下正在发生的事情,由于多处理,其他代码位运行多次而不是应该运行一次。 This is the same case for the actual function itself which runs the same call multiple times.对于多次运行相同调用的实际函数本身也是如此。 The fix of the if statement checks to see if this is the first run through. if 语句的修复会检查这是否是第一次运行。 If it is, then we do the entire call.如果是,那么我们进行整个调用。 If not, we don't run anything at all.如果没有,我们根本不运行任何东西。 This works because it makes sure that we are only running it once.这是有效的,因为它确保我们只运行一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM