如何在Keras中仅嵌入层的情况下训练模型

Question

I have some text without any labels. 我有一些没有标签的文字。 Just a bunch of text files. 只是一堆文本文件。 And I want to train an Embedding layer to map the words to embedding vectors. 我想训练一个嵌入层，将单词映射到嵌入向量。 Most of the examples I've seen so far are like this: 到目前为止，我看到的大多数示例都是这样的：

from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense

model = Sequential()
model.add(Embedding(max_words, embedding_dim, input_length=maxlen))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(optimizer='rmsprop',
    loss='binary_crossentropy',
    metrics=['acc'])
model.fit(x_train, y_train,
    epochs=10,
    batch_size=32,
    validation_data=(x_val, y_val))

They all assume that the Embedding layer is part of a bigger model which tries to predict a label. 他们都假设嵌入层是一个更大的模型的一部分，该模型试图预测标签。 But in my case, I have no label. 但就我而言，我没有标签。 I'm not trying to classify anything. 我没有试图对任何东西进行分类。 I just want to train the mapping from words (more precisely integers) to embedding vectors. 我只想训练从单词（更确切地说是整数）到嵌入向量的映射。 But the fit method of the model, asks for x_train and y_train (as the example given above). 但是模型的fit方法要求x_train和y_train （如上面给出的示例）。

How can I train a model only with an Embedding layer and no labels? 如何训练仅具有嵌入层且没有标签的模型？

[UPDATE] [UPDATE]

Based on the answer I've got from @Daniel Möller, Embedding layer in Keras is implementing a supervised algorithm and thus cannot be trained without labels. 根据我从@DanielMöller那里得到的答案，Keras的Embedding层正在实现一种监督算法，因此如果没有标签就无法进行训练。 Initially, I was thinking that it is a variation of Word2Vec and thus does not need labels to be trained. 最初，我以为它是Word2Vec的变体，因此不需要训练标签。 Apparently, that's not the case. 显然不是这样。 Personally, I ended up using the FastText which has nothing to do with Keras or Python. 我个人最终使用了与Keras或Python无关的FastText 。

Answer 1

Does it make sense to do that without a label/target? 在没有标签/目标的情况下这样做是否有意义？

How will your model decide which values in the vectors are good for anything if there is no objective? 如果没有目标，您的模型将如何确定向量中的哪些值对任何事物都有利？

All embeddings are "trained" for a purpose. 所有嵌入都是出于特定目的而经过“训练”的。 If there is no purpose, there is no target, if there is no target, there is no training. 如果没有目标，就没有目标，如果没有目标，就没有培训。

If you really want to transform words in vectors without any purpose/target, you've got two options: 如果您真的想在没有任何目的/目标的情况下转换向量中的单词，则有两种选择：

Make one-hot encoded vectors. 制作一键编码的向量。 You may use the Keras to_categorical function for that. 您可以to_categorical使用to_categorical函数。
Use a pretrained embedding. 使用预训练的嵌入。 There are some available, such as glove, embeddings from Google, etc. (All of they were trained at some point for some purpose). 有一些可用的工具，例如手套，来自Google的嵌入等。（所有这些工具都出于某种目的在某个时候进行了培训）。

A very naive approach based on our chat, considering word distance 一种非常幼稚的方法，基于我们的聊天，考虑了单词距离

Warning: I don't really know anything about Word2Vec, but I'll try to show how to add the rules for your embedding using some naive kind of word distance and how to use dummy "labels" just to satisfy Keras' way of training. 警告：我对Word2Vec并不了解，但是我将尝试展示如何使用一些幼稚的单词距离来添加嵌入规则，以及如何使用虚拟的“标签”来满足Keras的培训方式。

from keras.layers import Input, Embedding, Subtract, Lambda
import keras.backend as K
from keras.models import Model

input1 = Input((1,)) #word1
input2 = Input((1,)) #word2

embeddingLayer = Embedding(...params...)

word1 = embeddingLayer(input1)
word2 = embeddingLayer(input2)

#naive distance rule, subtract, expect zero difference
word_distance = Subtract()([word1,word2])

#reduce all dimensions to a single dimension
word_distance = Lambda(lambda x: K.mean(x, axis=-1))(word_distance)

model = Model([input1,input2], word_distance)

Now that our model outputs directly a word distance, our labels will be "zero", they're not really labels for a supervised training, but they're the expected result of the model, something necessary for Keras to work. 现在我们的模型直接输出一个单词距离，我们的标签将为“零”，它们并不是真正意义上的有监督训练的标签，但它们是模型的预期结果，这对于Keras来说是必需的。

We can have as loss function the mae (mean absolute error) or mse (mean squared error), for instance. 例如，我们可以将mae （平均绝对误差）或mse （平均平方误差）作为损失函数。

model.compile(optimizer='adam', loss='mse')

And training with word2 being the word after word1: 然后训练word2作为word1之后的单词：

xTrain = entireText
xTrain1 = entireText[:-1]
xTrain2 = entireText[1:]
yTrain = np.zeros((len(xTrain1),))

model.fit([xTrain1,xTrain2], yTrain, .... more params.... )

Although this may be completely wrong regarding what Word2Vec really does, it shows the main points that are: 尽管就Word2Vec的实际功能而言这可能是完全错误的，但它显示了以下要点：

Embedding layers don't have special properties, they're just trainable lookup tables 嵌入层没有特殊属性，它们只是可训练的查找表
Rules for creating an embedding should be defined by the model and expected outputs 创建嵌入的规则应由模型和预期输出定义
A Keras model will need "targets", even if those targets are not "labels" but a mathematical trick for an expected result. Keras模型将需要“目标”，即使这些目标不是“标签”，而是获得预期结果的数学技巧。

如何在Keras中仅嵌入层的情况下训练模型

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-11-18 01:37:18

A very naive approach based on our chat, considering word distance 一种非常幼稚的方法，基于我们的聊天，考虑了单词距离

如何在Keras中仅嵌入层的情况下训练模型

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-11-18 01:37:18

A very naive approach based on our chat, considering word distance 一种非常幼稚的方法，基于我们的聊天，考虑了单词距离

解决方案1
2 已采纳 2018-11-18 01:37:18