简体   繁体   中英

Scikit-Learn GridSearchCV failing on on a gensim LDA model

This is the code for creating the model :

import gensim
NUM_TOPICS = 4
ldamodel = gensim.models.ldamodel.LdaModel(corpus,num_topics = 
NUM_TOPICS,id2word=dictionary,passes=100)
ldamodel.save('model5.gensim')
topics = ldamodel.print_topics(num_words=4)
print(topics)

This is the code for GridSearchCV :

search_params = {'n_components': [4, 6, 8, 10, 20], 'learning_decay': [.5, .7, .9]}


# Init Grid Search Class
model = GridSearchCV(ldamodel, param_grid=search_params)

# Do the Grid Search
model.fit(data_vectorized)

This is the output :

*---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-108-1a35c49ac19e> in <module>
      9 
     10 # Do the Grid Search
---> 11 model.fit(data_vectorized)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    627 
    628         scorers, self.multimetric_ = _check_multimetric_scoring(
--> 629             self.estimator, scoring=self.scoring)
    630 
    631         if self.multimetric_:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py in _check_multimetric_scoring(estimator, scoring)
    471     if callable(scoring) or scoring is None or isinstance(scoring,
    472                                                           str):
--> 473         scorers = {"score": check_scoring(estimator, scoring=scoring)}
    474         return scorers, False
    475     else:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\metrics\_scorer.py in check_scoring(estimator, scoring, allow_none)
    399     if not hasattr(estimator, 'fit'):
    400         raise TypeError("estimator should be an estimator implementing "
--> 401                         "'fit' method, %r was passed" % estimator)
    402     if isinstance(scoring, str):
    403         return get_scorer(scoring)
TypeError: estimator should be an estimator implementing 'fit' method, <gensim.models.ldamodel.LdaModel object at 0x000002121E55D3C8> was passed*

You are trying to use GridSearchCV object from a scikit-learn package which requires the model object on which it is run to implement certain methods (as in the error message: fit method in particular). Since scikit-learn is not related in any way to gensim you need to ensure that they are compatible by subclassing an Estimator class in scikit-learn and encapsulating gensim training in the fit method.

Also, it does not seem to me in the LdaModel documentation that it uses the parameters ( n_components , learning_decay ) that you are attempting to search for. You can only search for the values of the parameters that the model uses.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM