简体   繁体   中英

how to specify random_state in LDA model for topic modelling

I read the gensim LDA model documentation about random_state which states that:

random_state ({np.random.RandomState, int}, optional) 

– Either a randomState object or a seed to generate one. Useful for reproducibility.

I have been tring put random_state=42 or

random_seed=42
state=np.random.RandomState(random_seed)
state.randn(1)
random_state=state.randn(1) 

which did not work. Can anyone suggest what should i do

model=ldaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics, random_state=None)

I tied to use it without random_state the function it works but with random_state i got error message saying LDA model is not defined

def compute_coherence_values(dictionary, corpus, texts, limit, random_state, start=2, step=3):

coherence_values = []
model_list = []
for num_topics in range(start, limit, step):
    #model=LdaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics)
    model=ldaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics, 
                                                  random_state)
    model_list.append(model)
    coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
    coherence_values.append(coherencemodel.get_coherence())

return model_list, coherence_values

The mistake in your code is in here:

 model=ldaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics, 
                                                  random_state)

You can't just pass the variable random_state without specifying the label. Just passing the variable to the method with an int number means nothing to the ldaModel method, since the method does not take positional parameter. The method takes named parameters. So it should be like this:

model=ldaModel(corpus=corpus, id2word=dictionary, num_topics=num_topics, 
                                                  random_state = random_state)

I have an implementation of the LDA that uses LatentDirichletAllocation from sklearn.decomposition , and for the random_state it takes an integer. Here is an example:

lda_model = LatentDirichletAllocation(n_components=10,        
                                  max_iter=10,               
                                  learning_method='online',   
                                  random_state=100,          
                                  batch_size=128,            
                                  evaluate_every = -1,       
                                  n_jobs = -1 )

Here is a good tutorial on how to implement and LDA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM