简体   繁体   English

如何为使用keras训练的语言模型计算困惑?

[英]How to calculate perplexity for a language model trained using keras?

Using Python 2.7 Anaconda on Windows 10 在Windows 10上使用Python 2.7 Anaconda

I have trained a GRU neural network to build a language model using keras: 我已经训练了一个GRU神经网络来使用keras构建语言模型:

print('Build model...')
model = Sequential()
model.add(GRU(512, return_sequences=True, input_shape=(maxlen, len(chars))))
model.add(Dropout(0.2))
model.add(GRU(512, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

How do I calculate the perplexity of this language model? 如何计算这种语言模型的困惑? For example, NLTK offers a perplexity calculation function for its models. 例如,NLTK为其模型提供了困惑度计算功能。

I see that you have also followed the Keras tutorial on language model, which to my understanding is not entirely correct. 我发现您还遵循了Keras的语言模型教程,据我所知这并不是完全正确的。 This is due to the fact that the language model should be estimating the probability of every subsequence eg, P(c_1,c_2..c_N)=P(c_1)P(c_2 | c_1)..P(c_N | c_N-1...c_1) However, assuming your input is a matrix with shape sequence_length X #characters and your target is the character following the sequence, the output of your model will only yield the last term P(c_N | c_N-1...c_1) 这是由于以下事实:语言模型应该估计每个子序列的概率,例如P(c_1,c_2..c_N)= P(c_1)P(c_2 | c_1).. P(c_N | c_N-1)。 ..c_1)但是,假设您的输入是一个形状为sequence_length X#个字符的矩阵,并且目标是该序列之后的字符,则模型的输出将仅生成最后一项P(c_N | c_N-1 ... c_1 )

Following that the perplexity is P(c_1,c_2..c_N)^{-1/N}, you cannot get all of the terms. 随之而来的困惑是P(c_1,c_2..c_N)^ {-1 / N},您将无法获得所有的条件。 This is why I recommend using the TimeDistributedDense layer. 这就是为什么我建议使用TimeDistributedDense层的原因。 It will give you a matrix of sequence_length X #characters, where every row is a probability distribution over the characters, call it proba 它将为您提供一个sequence_length X#个字符的矩阵,其中每一行都是字符上的概率分布,称为proba

From every row of proba, you need the column that contains the prediction for the correct character: 在proba的每一行中,都需要包含正确字符预测的列:

correct_proba = proba[np.arange(maxlen),yTest], Correct_proba = proba [np.arange(maxlen),yTest],

assuming yTest is a vector containing the index of the correct character at every time step 假设yTest是一个包含每个时间步长正确字符的索引的向量

Then the perplexity for a sequence ( and you have to average over all your training sequences is) 然后,序列的困惑(您必须对所有训练序列求平均值)

np.power(2,-np.sum(np.log(correct_proba),axis=1)/maxlen) np.power(2,-np.sum(np.log(correct_proba),轴= 1)/ MAXLEN)

PS. PS。 I would have rather written the explanation in latex 我宁愿用乳胶写解释

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM