简体繁体中英

Is the Gensim word2vec model same as the standard model by Mikolov?

原文 2020-04-19 11:28:03 6 1 python/ nlp/ gensim/ word2vec

I am implementing a paper to compare our performance. In the paper, the uathor says

300-dimensional pre-trained word2vec vectors (Mikolov et al., 2013)

I am wondering whether the pretrained word2vec Gensim model here is same as the pretrained embeddings on the official Google site (the GoogleNews-vectors-negative300.bin.gz file)

My source of doubt arises from this line in Gensim documentation (in Word2Vec Demo section)

We will fetch the Word2Vec model trained on part of the Google News dataset, covering approximately 3 million words and phrases

Does this mean the model on gensim is not fully trained? Is it different from the official embeddings by Mikolov?

1 answers

That demo code for reading word-vectors is downloading the exact same Google-trained GoogleNews-vectors-negative300 set of vectors. (No one else can try re-training that dataset, because the original corpus of news articles user, over 100B words of training data from around 2013 if I recall correctly, is internal to Google.)

Algorithmically, the gensim Word2Vec implementation was closely modeled after the word2vec.c code released by Google/Mikolov, so should match its results in measurable respects with regard to any newly-trained vectors. (Slight differences in threading approaches may have a slight difference.)

Gensim Word2Vec model: Cut dimensions

Incremental Word2Vec Model Training in gensim

Gensim Word2Vec model floating point

Add word embedding to word2vec gensim model

word not in vocabulary after training gensim word2vec model, why?

How to remove a word completely from a Word2Vec model in gensim?

Matching words and vectors in gensim Word2Vec model

Can't load saved gensim word2vec model

Python: What is the “size” parameter in Gensim Word2vec model class

Index out of bounds error with Gensim 4.0.1 Word2Vec model

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Gensim Word2Vec model: Cut dimensions Incremental Word2Vec Model Training in gensim Gensim Word2Vec model floating point Add word embedding to word2vec gensim model word not in vocabulary after training gensim word2vec model, why? How to remove a word completely from a Word2Vec model in gensim? Matching words and vectors in gensim Word2Vec model Can't load saved gensim word2vec model Python: What is the “size” parameter in Gensim Word2vec model class Index out of bounds error with Gensim 4.0.1 Word2Vec model

Related Tags

Is the Gensim word2vec model same as the standard model by Mikolov?

Question

1 answers

solution1 1 ACCPTED 2020-04-19 18:23:34

solution1
1 ACCPTED 2020-04-19 18:23:34