調整Gensim加載的Google的Word2Vec，使其適應您的詞匯，然后創建嵌入向量

Question

我想知道如何限制Google的Word2Vec的使用范圍。 Google的Word2 vec鏈接： https ：//drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit？usp = sharing

這就是我所擁有的：

import gensim

# Load Google's pre-trained Word2Vec model.
model = gensim.models.Word2Vec.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True)

embedding_matrix = np.zeros((len(my_vocabulary), 300))

我的詞匯是我的語料庫中唯一單詞的列表。 我如何只感覺my_vocabulary中的單詞的嵌入矩陣？ 另外，我想擁有一個靈活性，如果我的單詞在Google的word2vec中不存在，則可以用零填充。

謝謝

Answer 1

您可以使用gensim.models.Word2Vec來構建自定義w2v模型。

sentences = [['cats', 'can', 'not', 'fly'], ['dogs','cant' 'drive']]
model = gensim.models.Word2Vec(sentences, min_count=1)

參考： https ： //rare-technologies.com/word2vec-tutorial/

Answer 2

您可以使用以下代碼填充嵌入矩陣：

import gensim

# Load Google's pre-trained Word2Vec model.
model = gensim.models.Word2Vec.load_word2vec_format('path/to/bin', binary=True)

embedding_matrix = np.zeros((len(my_vocabulary), 300))

for index,word in enumerate(my_vocabulary):
    try:
        # update embedding matrix using Google's pretrained model
        embedding_matrix[index] = model.mv[word] 
    except:
        # when word isn't found in pretrained model, we keep the embedding matrix unchanged at that index (assigned to zero)
        pass

此外，您可以探索將詞匯量單詞初始化為零以外的其他值的方法。

調整Gensim加載的Google的Word2Vec，使其適應您的詞匯，然后創建嵌入向量

問題描述

2 個解決方案

解決方案1
1 2018-04-15 23:49:06

解決方案2
0 2018-04-16 09:24:37

調整Gensim加載的Google的Word2Vec，使其適應您的詞匯，然后創建嵌入向量

問題描述

2 個解決方案

解決方案1 1 2018-04-15 23:49:06

解決方案2 0 2018-04-16 09:24:37

解決方案1
1 2018-04-15 23:49:06

解決方案2
0 2018-04-16 09:24:37