简体繁体 English

LSTM 权重？

[英]How to visualize RNN/LSTM weights in Keras/TensorFlow?

原文 2019-12-10 21:55:46 9 1 python/ tensorflow/ keras/ visualization/ recurrent-neural-network

I've come across research publications and Q&A's discussing a need for inspecting RNN weights;我遇到过研究出版物和问答讨论检查 RNN 权重的必要性； some related answers are in the right direction, suggesting get_weights() - but how do I actually visualize the weights meaningfully ?一些相关的答案是在正确的方向，建议get_weights() - 但是我如何真正有意义地可视化权重？ Namely, LSTMs and GRUs have gates , and all RNNs have channels that serve as independent feature extractors - so how do I (1) fetch per-gate weights, and (2) plot them in an informative manner?也就是说，LSTM 和 GRU 都有门，并且所有 RNN 都有用作独立特征提取器的通道- 那么我如何（1）获取每个门的权重，以及（2）以信息丰富的方式绘制它们？

1 个解决方案

Keras/TF build RNN weights in a well-defined order, which can be inspected from the source code or via layer.__dict__ directly - then to be used to fetch per-kernel and per-gate weights; Keras/TF 以明确定义的顺序构建 RNN 权重，可以从源代码或直接通过layer.__dict__检查 - 然后用于获取每个内核和每个门的权重； per-channel treatment can then be employed given a tensor's shape.给定张量的形状，然后可以采用每通道处理。 Below code & explanations cover every possible case of a Keras/TF RNN, and should be easily expandable to any future API changes.下面的代码和解释涵盖了 Keras/TF RNN 的所有可能情况，并且应该可以轻松扩展到任何未来的 API 更改。

Also see visualizing RNN gradients, and an application to RNN regularization ;另请参阅可视化 RNN 梯度和RNN 正则化的应用； unlike in the former post, I won't be including a simplified variant here, as it'd still be rather large and complex per the nature of weight extraction and organization;与前一篇文章不同，我不会在这里包含一个简化的变体，因为根据权重提取和组织的性质，它仍然相当大和复杂； instead, simply view relevant source code in the repository (see next section).相反，只需查看存储库中的相关源代码（请参阅下一节）。

Code source : See RNN (this post included w/ bigger images), my repository;代码来源：参见 RNN （这篇文章包含更大的图像），我的存储库； included are:包括：

Activations visualization激活可视化
Weights visualization权重可视化
Activations gradients visualization激活梯度可视化
Weights gradients visualization权重梯度可视化
Docstrings explaining all functionality解释所有功能的文档字符串
Support for Eager, Graph, TF1, TF2, and from keras & from tf.keras支持 Eager、Graph、TF1、TF2，以及from keras和from tf.keras
Greater visual customizability than shown in examples比示例中显示的更具视觉可定制性

Visualization methods :可视化方法：

2D heatmap : plot weight distributions per gate, per kernel, per direction; 2D 热图：绘制每个门、每个内核、每个方向的权重分布； clearly shows kernel-to-hidden relations清楚地显示内核到隐藏的关系
histogram : plot weight distributions per gate, per kernel, per direction;直方图：绘制每个门、每个内核、每个方向的权重分布； loses context info丢失上下文信息

EX 1: uni-LSTM, 256 units, weights -- batch_shape = (16, 100, 20) (input) EX 1：uni-LSTM，256 个单位，权重batch_shape = (16, 100, 20) （输入）
rnn_histogram(model, 'lstm', equate_axes=False, show_bias=False)
rnn_histogram(model, 'lstm', equate_axes=True, show_bias=False)
rnn_heatmap(model, 'lstm')

Top plot is a histogram subplot grid, showing weight distributions per kernel, and within each kernel, per gate顶部图是直方图子图网格，显示每个内核的权重分布，以及每个内核内每个门的权重分布
Second plot sets equate_axes=True for an even comparison across kernels and gates, improving quality of comparison, but potentially degrading visual appeal第二个图设置equate_axes=True用于跨内核和门的均匀比较，提高比较质量，但可能会降低视觉吸引力
Last plot is a heatmap of the same weights, with gate separations marked by vertical lines, and bias weights also included最后一张图是相同权重的热图，用垂直线标记门分离，还包括偏置权重
Unlike histograms, the heatmap preserves channel/context information : input-to-hidden and hidden-to-hidden transforming matrices can be clearly distinguished与直方图不同，热图保留了通道/上下文信息：可以清楚地区分输入到隐藏和隐藏到隐藏的转换矩阵
Note the large concentration of maximal values at the Forget gate;请注意忘记门处的最大值的大量集中； as trivia, in Keras (and usually), bias gates are all initialized to zeros, except the Forget bias, which is initialized to ones作为琐事，在 Keras（通常）中，偏置门都初始化为零，除了忘记偏置，它被初始化为 1

EX 2: bi-CuDNNLSTM, 256 units, weights -- batch_shape = (16, 100, 16) (input) EX 2：bi-CuDNNLSTM，256 个单位，权重batch_shape = (16, 100, 16) （输入）
rnn_histogram(model, 'bidir', equate_axes=2)
rnn_heatmap(model, 'bidir', norm=(-.8, .8))

Bidirectional is supported by both;两者都支持双向； biases included in this example for histograms此示例中包含的直方图偏差
Note again the bias heatmaps;再次注意偏差热图； they no longer appear to reside in the same locality as in EX 1. Indeed, CuDNNLSTM (and CuDNNGRU ) biases are defined and initialized differently - something that can't be inferred from histograms它们似乎不再与 EX 1 位于同一位置。事实上， CuDNNLSTM （和CuDNNGRU ）偏差的定义和初始化方式不同——这是无法从直方图中推断出来的

EX 3: uni-CuDNNGRU, 64 units, weights gradients -- batch_shape = (16, 100, 16) (input) EX 3：uni-CuDNNGRU，64 个单位，权重梯度batch_shape = (16, 100, 16) （输入）
rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)

We may wish to visualize gradient intensity , which can be done via absolute_value=True and a greyscale colormap我们可能希望可视化梯度强度，这可以通过absolute_value=True和灰度颜色图来完成
Gate separations are apparent even without explicit separating lines in this example:即使在此示例中没有明确的分隔线，门分隔也很明显：
- New is the most active kernel gate (input-to-hidden), suggesting more error correction on permitting information flow New是最活跃的内核门（输入到隐藏），建议对允许信息流进行更多的纠错
- Reset is the least active recurrent gate (hidden-to-hidden), suggesting least error correction on memory-keeping Reset是最不活跃的循环门（隐藏到隐藏），表明对记忆保持的纠错最少

BONUS EX: LSTM NaN detection, 512 units, weights -- batch_shape = (16, 100, 16) (input)奖励 EX：LSTM NaN 检测，512 个单位，权重batch_shape = (16, 100, 16) （输入）

Both the heatmap and the histogram come with built-in NaN detection - kernel-, gate-, and direction-wise热图和直方图都带有内置的 NaN 检测 - 内核、门和方向
Heatmap will print NaNs to console, whereas histogram will mark them directly on the plot热图会将 NaN 打印到控制台，而直方图将直接在绘图上标记它们
Both will set NaN values to zero before plotting;两者都会在绘图前将 NaN 值设置为零； in example below, all related non-NaN weights were already zero在下面的示例中，所有相关的非 NaN 权重都已经为零