簡體 English 中英

Gensim word2vec model 是否與 Mikolov 的標准 model 相同？

[英]Is the Gensim word2vec model same as the standard model by Mikolov?

原文 2020-04-19 11:28:03 0 1 python/ nlp/ gensim/ word2vec

我正在實施一篇論文來比較我們的表現。 在論文中，uathor 說

300 維預訓練 word2vec 向量 (Mikolov et al., 2013)

我想知道這里的預訓練 word2vec Gensim model 是否與Google 官方網站上的預訓練嵌入相同（GoogleNews-vectors-negative300.bin.gz 文件）

我的懷疑來自 Gensim 文檔中的這一行（在 Word2Vec 演示部分）

我們將獲取在部分 Google 新聞數據集上訓練的 Word2Vec model，涵蓋大約 300 萬個單詞和短語

這是否意味着 gensim 上的 model 沒有經過充分訓練？ 它與 Mikolov 的官方嵌入有什么不同嗎？

1 個解決方案

用於讀取詞向量的演示代碼正在下載完全相同的 Google 訓練GoogleNews-vectors-negative300向量集。 （沒有其他人可以嘗試重新訓練該數據集，因為新聞文章用戶的原始語料庫，如果我沒記錯的話，來自 2013 年左右的超過 100B 字的訓練數據，是 Google 內部的。）

從算法上講， gensim Word2Vec的實現是在 Google/Mikolov 發布的word2vec.c代碼之后緊密建模的，因此對於任何新訓練的向量，它的結果應該在可測量的方面匹配。 （線程方法的細微差異可能會略有不同。）

Gensim Word2Vec 模型：切割尺寸

[英]Gensim Word2Vec model: Cut dimensions

gensim中的增量Word2Vec模型訓練

[英]Incremental Word2Vec Model Training in gensim

Gensim Word2Vec model 浮點數

[英]Gensim Word2Vec model floating point

將詞嵌入添加到 word2vec gensim 模型

[英]Add word embedding to word2vec gensim model

訓練gensim word2vec模型后，詞匯不在詞匯表中，為什么？

[英]word not in vocabulary after training gensim word2vec model, why?

如何從gensim中的Word2Vec模型中完全刪除單詞？

[英]How to remove a word completely from a Word2Vec model in gensim?

在gensim Word2Vec模型中匹配單詞和向量

[英]Matching words and vectors in gensim Word2Vec model

無法加載已保存的gensim word2vec模型

[英]Can't load saved gensim word2vec model

Python：Gensim Word2vec 模型類中的“大小”參數是什么

[英]Python: What is the “size” parameter in Gensim Word2vec model class

Gensim 4.0.1 Word2Vec model 的索引越界錯誤

[英]Index out of bounds error with Gensim 4.0.1 Word2Vec model

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Gensim Word2Vec 模型：切割尺寸 gensim中的增量Word2Vec模型訓練 Gensim Word2Vec model 浮點數將詞嵌入添加到 word2vec gensim 模型訓練gensim word2vec模型后，詞匯不在詞匯表中，為什么？如何從gensim中的Word2Vec模型中完全刪除單詞？在gensim Word2Vec模型中匹配單詞和向量無法加載已保存的gensim word2vec模型 Python：Gensim Word2vec 模型類中的“大小”參數是什么 Gensim 4.0.1 Word2Vec model 的索引越界錯誤

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM