简体   繁体   中英

How to load the pre-trained doc2vec model and use it's vectors

Does anyone know which function should I use if I want to use the pre-trained doc2vec models in this website https://github.com/jhlau/doc2vec ?

I know we can use the Keyvectors.load_word2vec_format() to laod the word vectors from pre-trained word2vec models, but do we have a similar function to load pre-trained doc2vec models as well in gensim?

Thanks a lot.

When a model like Doc2Vec is saved with gensim's native save() , it can be reloaded with the native load() method:

model = Doc2Vec.load(filename)

Note that large internal arrays may have been saved alongside the main filename, in other filenames with extra extensions – and all those files must be kept together to re-load a fully-functional model. (You still need to specify only the main save file, and the auxiliary files will be discovered at expected names alongside it in the same directory.)

You may have other issues trying to use those pre-trained models. In particular:

  • as noted in the linked page, the author used a custom variant of gensim that forked off about 2 years ago; the files might not load in standard gensim, or later gensims

  • it's not completely clear what parameters were used to train those models (though I suppose if you succeed in loading them you could see them as properties in the model), and how much meta-optimization was used for which purposes, and whether those purposes will match your own project

  • if the parameters are as shown in one of the repo files, [train_model.py][1] , some are inconsistent with best practices (a min_count=1 is usually bad for Doc2Vec ) or apparent model-size (a mere 1.4GB model couldn't hold 300-dimensional vectors for all of the millions of documents or word-tokens in 2015 Wikipedia)

I would highly recommend training your own model, on a corpus you understand, with recent code, and using metaparameters optimized for your own purposes.

Try this:

import gensim.models as g

model="model_folder/doc2vec.bin"  #point to downloaded pre-trained doc2vec model

#load model
m = g.Doc2Vec.load(model)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM