简体   繁体   English

使用 Keras,如何将 CuDNNLSTM 生成的权重加载到 LSTM 模型中?

[英]Using Keras, How can I load weights generated from CuDNNLSTM into LSTM Model?

I've developed a NN Model with Keras, based on the LSTM Layer.我已经基于 LSTM 层使用 Keras 开发了一个 NN 模型。 In order to increase speed on Paperspace (a GPU Cloud processing infrastructure), I've switched the LSTM Layer with the new CuDNNLSTM Layer.为了提高 Paperspace(一个 GPU 云处理基础设施)的速度,我用新的CuDNNLSTM层切换了LSTM层。 However this is usable only on machines with GPU cuDNN support.然而,这仅适用于支持 GPU cuDNN 的机器。 PS: CuDNNLSTM is available only on Keras master , not in the latest release. PS:CuDNNLSTM仅适用于Keras master ,而不是最新版本。

So I've generated the weights and saved them to hdf5 format on the Cloud, and I'd like to use them locally on my MacBook.所以我已经生成了权重并将它们保存为hdf5格式,我想在我的 MacBook 上本地使用它们。 Since CuDNNLSTM layer is not available, only for my local installation I've switched back to LSTM.由于 CuDNNLSTM 层不可用,仅对于我的本地安装,我已切换回 LSTM。

Reading this tweet about CuDNN from @fchollet I thought it would work just fine, simply reading the weights back into the LSTM model.从@fchollet读到这条关于 CuDNN 的推文,我认为它会工作得很好,只需将权重读回 LSTM 模型即可。

However, when I try to import them Keras is throwing this error:但是,当我尝试导入它们时,Keras 抛出此错误:

Traceback (most recent call last):
{...}
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 2048 and 4096 for 'Assign_2' (op: 'Assign') with input shapes: [2048], [4096].
{...}
ValueError: Dimension 0 in both shapes must be equal, but are 2048 and 4096 for 'Assign_2' (op: 'Assign') with input shapes: [2048], [4096]

Analyzing the hdf5 files with h5cat I can see that the two structures are different.用 h5cat 分析hdf5文件我可以看到这两种结构是不同的。

TL;DR TL; 博士

I cannot load weights generated from CuDNNLSTM into a LSTM model.我无法将CuDNNLSTM生成的权重加载LSTM模型中。 Am i doing something in the wrong way?我是否以错误的方式做某事? How can I get them to work seamlessly?我怎样才能让他们无缝地工作?

Here is my model:这是我的模型:

SelectedLSTM = CuDNNLSTM if is_gpu_enabled() else LSTM
# ...
model = Sequential()
model.add(SelectedLSTM(HIDDEN_DIM, return_sequences=True, input_shape=(SEQ_LENGTH, vocab_size)))
model.add(Dropout(0.2))
model.add(SelectedLSTM(HIDDEN_DIM, return_sequences=False))
model.add(Dense(vocab_size))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

The reason is that the CuDNNLSTM layer has a bias twice as large as that of LSTM .原因是CuDNNLSTM层的biasLSTM两倍。 It's because of the underlying implementation of cuDNN API.这是因为cuDNN API的底层实现。 You can compare the following equations (copied from cuDNN user's guide) to the usual LSTM equations:您可以将以下方程(从 cuDNN 用户指南复制)与常用的 LSTM 方程进行比较:

cuDNN LSTM 方程

CuDNN uses two bias terms, so the number of bias weights is doubled. CuDNN 使用两个偏置项,因此偏置权重的数量增加了一倍。 To convert it back to what LSTM uses, the two bias terms need to be summed.要将其转换回LSTM使用的内容,需要将两个偏差项相加。

I've submitted a PR to do the conversion and it's merged.我已经提交了一个PR来进行转换并且它被合并了。 You can install the latest Keras from GitHub and the problem in weight loading should be solved.你可以从GitHub安装最新的Keras,权重加载的问题应该可以解决。

Just to add to @Yu-Yang's answer above, the latest Keras will automatically convert the CuDMMLSTM weights to LSTM , but it won't change your .json model architecture for you.只是为了添加上面@Yu-Yang 的回答,最新的CuDMMLSTM会自动将CuDMMLSTM权重转换为LSTM ,但它不会为您更改CuDMMLSTM模型架构。

To run inference on LSTM, you'll need to open the JSON file, and manually change all instanced of CuDNNLSTM to LSTM .要在 LSTM 上运行推理,您需要打开 JSON 文件,并手动将CuDNNLSTM所有实例CuDNNLSTMLSTM Then run model_from_json to load your model, and load_weights to load your weights.然后运行model_from_json加载你的模型,并load_weights加载你的权重。

I'd tried running load_weights without manually changing the CuDNNLSTM model at first, and got a bunch of errors.我一开始尝试在不手动更改CuDNNLSTM模型的情况下运行load_weights但出现了一堆错误。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM