简体   繁体   English

如何在 Keras/TensorFlow 中可视化 RNN/LSTM 权重?

[英]How to visualize RNN/LSTM weights in Keras/TensorFlow?

I've come across research publications and Q&A's discussing a need for inspecting RNN weights;我遇到过研究出版物和问答讨论检查 RNN 权重的必要性; some related answers are in the right direction, suggesting get_weights() - but how do I actually visualize the weights meaningfully ?一些相关的答案是在正确的方向,建议get_weights() - 但是我如何真正有意义地可视化权重? Namely, LSTMs and GRUs have gates , and all RNNs have channels that serve as independent feature extractors - so how do I (1) fetch per-gate weights, and (2) plot them in an informative manner?也就是说,LSTM 和 GRU 都有,并且所有 RNN 都有用作独立特征提取器的通道- 那么我如何(1)获取每个门的权重,以及(2)以信息丰富的方式绘制它们?

Keras/TF build RNN weights in a well-defined order, which can be inspected from the source code or via layer.__dict__ directly - then to be used to fetch per-kernel and per-gate weights; Keras/TF 以明确定义的顺序构建 RNN 权重,可以从源代码或直接通过layer.__dict__检查 - 然后用于获取每个内核每个门的权重; per-channel treatment can then be employed given a tensor's shape.给定张量的形状,然后可以采用每通道处理。 Below code & explanations cover every possible case of a Keras/TF RNN, and should be easily expandable to any future API changes.下面的代码和解释涵盖了 Keras/TF RNN 的所有可能情况,并且应该可以轻松扩展到任何未来的 API 更改。

Also see visualizing RNN gradients, and an application to RNN regularization ;另请参阅可视化 RNN 梯度和RNN 正则化的应用; unlike in the former post, I won't be including a simplified variant here, as it'd still be rather large and complex per the nature of weight extraction and organization;与前一篇文章不同,我不会在这里包含一个简化的变体,因为根据权重提取和组织的性质,它仍然相当大和复杂; instead, simply view relevant source code in the repository (see next section).相反,只需查看存储库中的相关源代码(请参阅下一节)。


Code source : See RNN (this post included w/ bigger images), my repository;代码来源参见 RNN (这篇文章包含更大的图像),我的存储库; included are:包括:

  • Activations visualization激活可视化
  • Weights visualization权重可视化
  • Activations gradients visualization激活梯度可视化
  • Weights gradients visualization权重梯度可视化
  • Docstrings explaining all functionality解释所有功能的文档字符串
  • Support for Eager, Graph, TF1, TF2, and from keras & from tf.keras支持 Eager、Graph、TF1、TF2,以及from kerasfrom tf.keras
  • Greater visual customizability than shown in examples比示例中显示的更具视觉可定制性

Visualization methods :可视化方法

  • 2D heatmap : plot weight distributions per gate, per kernel, per direction; 2D 热图:绘制每个门、每个内核、每个方向的权重分布; clearly shows kernel-to-hidden relations清楚地显示内核到隐藏的关系
  • histogram : plot weight distributions per gate, per kernel, per direction;直方图:绘制每个门、每个内核、每个方向的权重分布; loses context info丢失上下文信息

EX 1: uni-LSTM, 256 units, weights -- batch_shape = (16, 100, 20) (input) EX 1:uni-LSTM,256 个单位,权重batch_shape = (16, 100, 20) (输入)
rnn_histogram(model, 'lstm', equate_axes=False, show_bias=False)
rnn_histogram(model, 'lstm', equate_axes=True, show_bias=False)
rnn_heatmap(model, 'lstm')

  • Top plot is a histogram subplot grid, showing weight distributions per kernel, and within each kernel, per gate顶部图是直方图子图网格,显示每个内核的权重分布,以及每个内核内每个门的权重分布
  • Second plot sets equate_axes=True for an even comparison across kernels and gates, improving quality of comparison, but potentially degrading visual appeal第二个图设置equate_axes=True用于跨内核和门的均匀比较,提高比较质量,但可能会降低视觉吸引力
  • Last plot is a heatmap of the same weights, with gate separations marked by vertical lines, and bias weights also included最后一张图是相同权重的热图,用垂直线标记门分离,还包括偏置权重
  • Unlike histograms, the heatmap preserves channel/context information : input-to-hidden and hidden-to-hidden transforming matrices can be clearly distinguished与直方图不同,热图保留了通道/上下文信息:可以清楚地区分输入到隐藏和隐藏到隐藏的转换矩阵
  • Note the large concentration of maximal values at the Forget gate;请注意忘记门处的最大值的大量集中; as trivia, in Keras (and usually), bias gates are all initialized to zeros, except the Forget bias, which is initialized to ones作为琐事,在 Keras(通常)中,偏置门都初始化为零,除了忘记偏置,它被初始化为 1


EX 2: bi-CuDNNLSTM, 256 units, weights -- batch_shape = (16, 100, 16) (input) EX 2:bi-CuDNNLSTM,256 个单位,权重batch_shape = (16, 100, 16) (输入)
rnn_histogram(model, 'bidir', equate_axes=2)
rnn_heatmap(model, 'bidir', norm=(-.8, .8))

  • Bidirectional is supported by both;两者都支持双向; biases included in this example for histograms此示例中包含的直方图偏差
  • Note again the bias heatmaps;再次注意偏差热图; they no longer appear to reside in the same locality as in EX 1. Indeed, CuDNNLSTM (and CuDNNGRU ) biases are defined and initialized differently - something that can't be inferred from histograms它们似乎不再与 EX 1 位于同一位置。 事实上, CuDNNLSTM (和CuDNNGRU )偏差的定义和初始化方式不同——这是无法从直方图中推断出来的

在此处输入图片说明 在此处输入图片说明


EX 3: uni-CuDNNGRU, 64 units, weights gradients -- batch_shape = (16, 100, 16) (input) EX 3:uni-CuDNNGRU,64 个单位,权重梯度batch_shape = (16, 100, 16) (输入)
rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)

  • We may wish to visualize gradient intensity , which can be done via absolute_value=True and a greyscale colormap我们可能希望可视化梯度强度,这可以通过absolute_value=True和灰度颜色图来完成
  • Gate separations are apparent even without explicit separating lines in this example:即使在此示例中没有明确的分隔线,门分隔也很明显:
    • New is the most active kernel gate (input-to-hidden), suggesting more error correction on permitting information flow New是最活跃的内核门(输入到隐藏),建议对允许信息流进行更多的纠错
    • Reset is the least active recurrent gate (hidden-to-hidden), suggesting least error correction on memory-keeping Reset是最不活跃的循环门(隐藏到隐藏),表明对记忆保持的纠错最少


BONUS EX: LSTM NaN detection, 512 units, weights -- batch_shape = (16, 100, 16) (input)奖励 EX:LSTM NaN 检测,512 个单位,权重batch_shape = (16, 100, 16) (输入)

  • Both the heatmap and the histogram come with built-in NaN detection - kernel-, gate-, and direction-wise热图和直方图都带有内置的 NaN 检测 - 内核、门和方向
  • Heatmap will print NaNs to console, whereas histogram will mark them directly on the plot热图会将 NaN 打印到控制台,而直方图将直接在绘图上标记它们
  • Both will set NaN values to zero before plotting;两者都会在绘图前将 NaN 值设置为零; in example below, all related non-NaN weights were already zero在下面的示例中,所有相关的非 NaN 权重都已经为零

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM