如何在Tensorflow中使用预训练的Word2Vec模型

Question

I have a Word2Vec model which is trained in Gensim . 我有一个Word2Vec模型，在Gensim训练。 How can I use it in Tensorflow for Word Embeddings . 如何在Tensorflow使用它进行Word Embeddings 。 I don't want to train Embeddings from scratch in Tensorflow. 我不想在Tensorflow中从头开始训练嵌入。 Can someone tell me how to do it with some example code? 有人可以告诉我如何使用一些示例代码吗？

Answer 1

Let's assume you have a dictionary and inverse_dict list, with index in list corresponding to most common words: 假设你有一个字典和inverse_dict列表，列表中的索引对应于最常见的单词：

vocab = {'hello': 0, 'world': 2, 'neural':1, 'networks':3}
inv_dict = ['hello', 'neural', 'world', 'networks']

Notice how the inverse_dict index corresponds to the dictionary values. 注意inverse_dict索引如何对应于字典值。 Now declare your embedding matrix and get the values: 现在声明你的嵌入矩阵并获取值：

vocab_size = len(inv_dict)
emb_size = 300 # or whatever the size of your embeddings
embeddings = np.zeroes((vocab_size, emb_size))

from gensim.models.keyedvectors import KeyedVectors                         
model = KeyedVectors.load_word2vec_format('embeddings_file', binary=True)

for k, v in vocab.items():
  embeddings[v] = model[k]

You've got your embeddings matrix. 你有嵌入矩阵。 Good. 好。 Now let's assume you want to train on the sample: x = ['hello', 'world'] . 现在让我们假设你想训练样本： x = ['hello', 'world'] 。 But this doesn't work for our neural net. 但这对我们的神经网络不起作用。 We need to integerize: 我们需要整合：

x_train = []
for word in x:  
  x_train.append(vocab[word]) # integerize
x_train = np.array(x_train) # make into numpy array

Now we are good to go with embedding our samples on-the-fly 现在我们很高兴能够即时嵌入我们的样品

x_model = tf.placeholder(tf.int32, shape=[None, input_size])
with tf.device("/cpu:0"):
  embedded_x = tf.nn.embedding_lookup(embeddings, x_model)

Now embedded_x goes into your convolution or whatever. 现在embedded_x进入你的卷积或其他什么。 I am also assuming you are not retraining the embeddings, but simply using them. 我也假设你没有重新训练嵌入，只是简单地使用它们。 Hope that helps 希望有所帮助

如何在Tensorflow中使用预训练的Word2Vec模型

问题描述

1 个解决方案

解决方案1
10 2017-03-28 19:45:55

如何在Tensorflow中使用预训练的Word2Vec模型

问题描述

1 个解决方案

解决方案1 10 2017-03-28 19:45:55

解决方案1
10 2017-03-28 19:45:55