简体   繁体   English

Keras-如何将学习到的Embedding()层用于输入和输出?

[英]Keras - How to use the learned Embedding() Layer for Input and Output?

I would like to train a model to generate text, similar to this blog post 我想训练一个模型来生成文本,类似于此博客文章

This model uses - as far as I understand it - the following architecture 就我所知,此模型使用以下架构
[Sequence of Word Indices] -> [Embedding] -> [LSTM] -> [1 Hot Encoded "next word"] [单词索引的顺序]-> [嵌入]-> [LSTM]-> [1热编码的“下一个单词”]

Basically, the author models the process as classification problem, where the output layer has as many dimensions as there are words in the corpus. 基本上,作者将过程建模为分类问题,其中输出层的维数与语料库中的单词一样多。


I would like to model the process as regression problem, by re-using the learned Embeddings and then minimising the distance between predicted and real embedding. 我想通过重用学习到的嵌入,然后最小化预测嵌入和实际嵌入之间的距离,将过程建模为回归问题。

Basically: 基本上:

[Sequence of Word Indices] -> [Embedding] -> [LSTM] -> [Embedding-Vector of the "next word"] [单词索引的顺序]-> [嵌入]-> [LSTM]-> [下一个单词的嵌入向量]

My problem is, as the model is learning the embeddings on the fly, how could I feed the output in the same way I feed the input (as word indices) and then just tell the model "But before you use the output, replace it by its embedding vector" ? 我的问题是,由于模型正在动态学习嵌入,因此如何以输入输入(作为单词索引)的相同方式提供输出,然后告诉模型“但是在使用输出之前,请替换它通过其嵌入向量”?


Thank you very much for all help :-) 非常感谢您的所有帮助:-)

In training phase: 在训练阶段:

You can use two inputs (one for target, one for input, there's an offset of 1 between these two sequences) and reuse the embedding layer. 您可以使用两个输入(一个用于目标,一个用于输入,这两个序列之间的偏移量为1)并重新使用嵌入层。 If you input sentence is [1, 2, 3, 4], you can generate two sequence from it: in = [1, 2, 3], out = [2, 3, 4]. 如果输入句子为[1、2、3、4],则可以从中生成两个序列:in = [1、2、3],out = [2、3、4]。 Then you can use Keras' functional API to reuse embedding layer: 然后,您可以使用Keras的功能性API重用嵌入层:

emb1 = Embedding(in)
emb2 = Embedding(out)
predict_emb = LSTM(emb1)
loss = mean_squared_error(emb2, predict_emb)

Note it's not Keras code, just pseudo code. 请注意,它不是Keras代码,只是伪代码。

In testing phase: 在测试阶段:

Typically, you'll need to write your own decode function. 通常,您需要编写自己的解码函数。 Firstly, you choose a word (or a few words) to start from. 首先,您选择一个单词(或几个单词)作为起点。 Then, feed this word (or short word sequence) to network to predict next word's embedding. 然后,将此单词(或短单词序列)馈入网络以预测下一个单词的嵌入。 At this step, you can define your own sample function, say: you may want to choose the word whose embedding is nearest to the predicted one as the next word, or you may want to sample the next word from a distribution in which words with nearer embeddings to the predicted embedding has a larger probability to be chosen. 在此步骤中,您可以定义自己的示例函数,例如:您可能希望选择嵌入最接近预测的单词的单词作为下一个单词,或者您可能希望从分布有以下单词的分布中抽取下一个单词:与预测嵌入更接近的嵌入具有较大的可能性被选择。 Once you choose the next word, then feed it to network and predict the next one, and so forth. 选择下一个单词后,将其输入网络并预测下一个单词,依此类推。

So, you need to generate one word (put it another way, one embedding) at a time rather than input a whole sequence to the network. 因此,您需要一次生成一个单词(以另一种方式输入,一次嵌入),而不是将整个序列输入到网络中。

If the above statements are too abstract for you, here's an good example: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py 如果以上声明对您来说太抽象了,那么这是一个很好的示例: https : //github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

Line 85 is the introduction part, which randomly choose a small piece of texts from corpus to work on. 第85行是引言部分,它从语料库中随机选择一小段文本进行处理。 From line 90 on there's a loop, in which each step samples a character (This is a char-rnn, so each timestep inputs a char. For your case, it should be a word, not a char): L95 predicts next char's distribution, L96 samples from the distribution. 从第90行开始,存在一个循环,其中每个步骤都采样一个字符(这是一个字符,因此每个时间步都输入一个字符。对于您的情况,应该是一个单词,而不是一个字符):L95预测下一个字符的分布,从L96样本中分发。 Hope this is clear enough. 希望这足够清楚。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM