[英]How to visualize RNN/LSTM weights in Keras/TensorFlow?
I've come across research publications and Q&A's discussing a need for inspecting RNN weights;我遇到过研究出版物和问答讨论检查 RNN 权重的必要性; some related answers are in the right direction, suggesting
get_weights()
- but how do I actually visualize the weights meaningfully ?一些相关的答案是在正确的方向,建议
get_weights()
- 但是我如何真正有意义地可视化权重? Namely, LSTMs and GRUs have gates , and all RNNs have channels that serve as independent feature extractors - so how do I (1) fetch per-gate weights, and (2) plot them in an informative manner?也就是说,LSTM 和 GRU 都有门,并且所有 RNN 都有用作独立特征提取器的通道- 那么我如何(1)获取每个门的权重,以及(2)以信息丰富的方式绘制它们?
Keras/TF build RNN weights in a well-defined order, which can be inspected from the source code or via layer.__dict__
directly - then to be used to fetch per-kernel and per-gate weights; Keras/TF 以明确定义的顺序构建 RNN 权重,可以从源代码或直接通过
layer.__dict__
检查 - 然后用于获取每个内核和每个门的权重; per-channel treatment can then be employed given a tensor's shape.给定张量的形状,然后可以采用每通道处理。 Below code & explanations cover every possible case of a Keras/TF RNN, and should be easily expandable to any future API changes.
下面的代码和解释涵盖了 Keras/TF RNN 的所有可能情况,并且应该可以轻松扩展到任何未来的 API 更改。
Also see visualizing RNN gradients, and an application to RNN regularization ;另请参阅可视化 RNN 梯度和RNN 正则化的应用; unlike in the former post, I won't be including a simplified variant here, as it'd still be rather large and complex per the nature of weight extraction and organization;
与前一篇文章不同,我不会在这里包含一个简化的变体,因为根据权重提取和组织的性质,它仍然相当大和复杂; instead, simply view relevant source code in the repository (see next section).
相反,只需查看存储库中的相关源代码(请参阅下一节)。
Code source : See RNN (this post included w/ bigger images), my repository;代码来源:参见 RNN (这篇文章包含更大的图像),我的存储库; included are:
包括:
from keras
& from tf.keras
from keras
和from tf.keras
Visualization methods :可视化方法:
EX 1: uni-LSTM, 256 units, weights -- batch_shape = (16, 100, 20)
(input) EX 1:uni-LSTM,256 个单位,权重
batch_shape = (16, 100, 20)
(输入)
rnn_histogram(model, 'lstm', equate_axes=False, show_bias=False)
rnn_histogram(model, 'lstm', equate_axes=True, show_bias=False)
rnn_heatmap(model, 'lstm')
equate_axes=True
for an even comparison across kernels and gates, improving quality of comparison, but potentially degrading visual appealequate_axes=True
用于跨内核和门的均匀比较,提高比较质量,但可能会降低视觉吸引力 EX 2: bi-CuDNNLSTM, 256 units, weights -- batch_shape = (16, 100, 16)
(input) EX 2:bi-CuDNNLSTM,256 个单位,权重
batch_shape = (16, 100, 16)
(输入)
rnn_histogram(model, 'bidir', equate_axes=2)
rnn_heatmap(model, 'bidir', norm=(-.8, .8))
CuDNNLSTM
(and CuDNNGRU
) biases are defined and initialized differently - something that can't be inferred from histogramsCuDNNLSTM
(和CuDNNGRU
)偏差的定义和初始化方式不同——这是无法从直方图中推断出来的 EX 3: uni-CuDNNGRU, 64 units, weights gradients -- batch_shape = (16, 100, 16)
(input) EX 3:uni-CuDNNGRU,64 个单位,权重梯度
batch_shape = (16, 100, 16)
(输入)
rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)
absolute_value=True
and a greyscale colormapabsolute_value=True
和灰度颜色图来完成New
is the most active kernel gate (input-to-hidden), suggesting more error correction on permitting information flow New
是最活跃的内核门(输入到隐藏),建议对允许信息流进行更多的纠错Reset
is the least active recurrent gate (hidden-to-hidden), suggesting least error correction on memory-keeping Reset
是最不活跃的循环门(隐藏到隐藏),表明对记忆保持的纠错最少BONUS EX: LSTM NaN detection, 512 units, weights -- batch_shape = (16, 100, 16)
(input)奖励 EX:LSTM NaN 检测,512 个单位,权重
batch_shape = (16, 100, 16)
(输入)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.