[英]Using Keras, How can I load weights generated from CuDNNLSTM into LSTM Model?
I've developed a NN Model with Keras, based on the LSTM Layer.我已经基于 LSTM 层使用 Keras 开发了一个 NN 模型。 In order to increase speed on Paperspace (a GPU Cloud processing infrastructure), I've switched the LSTM Layer with the new CuDNNLSTM Layer.
为了提高 Paperspace(一个 GPU 云处理基础设施)的速度,我用新的CuDNNLSTM层切换了LSTM层。 However this is usable only on machines with GPU cuDNN support.
然而,这仅适用于支持 GPU cuDNN 的机器。 PS: CuDNNLSTM is available only on Keras
master
, not in the latest release. PS:CuDNNLSTM仅适用于Keras
master
,而不是最新版本。
So I've generated the weights and saved them to hdf5
format on the Cloud, and I'd like to use them locally on my MacBook.所以我已经生成了权重并将它们保存为
hdf5
格式,我想在我的 MacBook 上本地使用它们。 Since CuDNNLSTM layer is not available, only for my local installation I've switched back to LSTM.由于 CuDNNLSTM 层不可用,仅对于我的本地安装,我已切换回 LSTM。
Reading this tweet about CuDNN from @fchollet I thought it would work just fine, simply reading the weights back into the LSTM model.从@fchollet读到这条关于 CuDNN 的推文,我认为它会工作得很好,只需将权重读回 LSTM 模型即可。
However, when I try to import them Keras is throwing this error:但是,当我尝试导入它们时,Keras 抛出此错误:
Traceback (most recent call last):
{...}
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 2048 and 4096 for 'Assign_2' (op: 'Assign') with input shapes: [2048], [4096].
{...}
ValueError: Dimension 0 in both shapes must be equal, but are 2048 and 4096 for 'Assign_2' (op: 'Assign') with input shapes: [2048], [4096]
Analyzing the hdf5
files with h5cat I can see that the two structures are different.用 h5cat 分析
hdf5
文件我可以看到这两种结构是不同的。
TL;DR TL; 博士
I cannot load weights generated from CuDNNLSTM into a LSTM model.我无法将CuDNNLSTM生成的权重加载到LSTM模型中。 Am i doing something in the wrong way?
我是否以错误的方式做某事? How can I get them to work seamlessly?
我怎样才能让他们无缝地工作?
Here is my model:这是我的模型:
SelectedLSTM = CuDNNLSTM if is_gpu_enabled() else LSTM
# ...
model = Sequential()
model.add(SelectedLSTM(HIDDEN_DIM, return_sequences=True, input_shape=(SEQ_LENGTH, vocab_size)))
model.add(Dropout(0.2))
model.add(SelectedLSTM(HIDDEN_DIM, return_sequences=False))
model.add(Dense(vocab_size))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
The reason is that the CuDNNLSTM
layer has a bias
twice as large as that of LSTM
.原因是
CuDNNLSTM
层的bias
是LSTM
两倍。 It's because of the underlying implementation of cuDNN API.这是因为cuDNN API的底层实现。 You can compare the following equations (copied from cuDNN user's guide) to the usual LSTM equations:
您可以将以下方程(从 cuDNN 用户指南复制)与常用的 LSTM 方程进行比较:
CuDNN uses two bias terms, so the number of bias weights is doubled. CuDNN 使用两个偏置项,因此偏置权重的数量增加了一倍。 To convert it back to what
LSTM
uses, the two bias terms need to be summed.要将其转换回
LSTM
使用的内容,需要将两个偏差项相加。
I've submitted a PR to do the conversion and it's merged.我已经提交了一个PR来进行转换并且它被合并了。 You can install the latest Keras from GitHub and the problem in weight loading should be solved.
你可以从GitHub安装最新的Keras,权重加载的问题应该可以解决。
Just to add to @Yu-Yang's answer above, the latest Keras will automatically convert the CuDMMLSTM
weights to LSTM
, but it won't change your .json model architecture for you.只是为了添加上面@Yu-Yang 的回答,最新的
CuDMMLSTM
会自动将CuDMMLSTM
权重转换为LSTM
,但它不会为您更改CuDMMLSTM
模型架构。
To run inference on LSTM, you'll need to open the JSON file, and manually change all instanced of CuDNNLSTM
to LSTM
.要在 LSTM 上运行推理,您需要打开 JSON 文件,并手动将
CuDNNLSTM
所有实例CuDNNLSTM
为LSTM
。 Then run model_from_json
to load your model, and load_weights
to load your weights.然后运行
model_from_json
加载你的模型,并load_weights
加载你的权重。
I'd tried running load_weights
without manually changing the CuDNNLSTM
model at first, and got a bunch of errors.我一开始尝试在不手动更改
CuDNNLSTM
模型的情况下运行load_weights
但出现了一堆错误。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.