简体   繁体   English

Tensorflow 2:为什么多层感知器神经网络中的`W`s和`b`s有相同的`name`?

[英]Tensorflow 2: Why the same `name` of `W`s and `b`s in Multi-Layer Perceptron Neural Network?

I'm new to TensorFlow 2 and reading the docs: https://www.tensorflow.org/api_docs/python/tf/Module我是 TensorFlow 2 的新手并正在阅读文档: https ://www.tensorflow.org/api_docs/python/tf/Module

On this page, the part related to my question is: MLP (copy-paste from there):在此页面上,与我的问题相关的部分是: MLP (从那里复制粘贴):

class MLP(tf.Module):
  def __init__(self, input_size, sizes, name=None):
    super(MLP, self).__init__(name=name)
    self.layers = []
    with self.name_scope:
      for size in sizes:
        self.layers.append(Dense(input_dim=input_size, output_size=size))
        input_size = size
  @tf.Module.with_name_scope
  def __call__(self, x):
    for layer in self.layers:
      x = layer(x)
    return x

and I don't understand why the output of the following:而且我不明白为什么会输出以下内容:

>>> module = MLP(input_size=5, sizes=[5, 5])
>>> module.variables
(<tf.Variable 'mlp/b:0' shape=(5,) ...>,
<tf.Variable 'mlp/w:0' shape=(5, 5) ...>,
<tf.Variable 'mlp/b:0' shape=(5,) ...>,
<tf.Variable 'mlp/w:0' shape=(5, 5) ...>,
)

Where I expect mlp/b:1 and mlp/w:1 would appear.我期望mlp/b:1mlp/w:1会出现在哪里。 I also tried the same code on my machine and got the same result on name , ie both mlp/b:0 and mlp/w:0 appear twice.我还在我的机器上尝试了相同的代码并在name上得到了相同的结果,即mlp/b:0mlp/w:0都出现了两次。 Can anyone help me point out which point I have missed?谁能帮我指出我错过了哪一点? Would the result mean that the same W , b are reused?结果是否意味着相同的Wb被重用?

From the docs,从文档中,

A tf.Variable represents a tensor whose value can be changed by running ops on it.一个 tf.Variable代表一个张量,其值可以通过对其运行操作来更改。 Specific ops allow you to read and modify the values of this tensor.特定的操作允许您读取和修改此张量的值。 Higher level libraries like tf.keras use tf.Variable to store model parameters.像 tf.keras 这样的高级库使用 tf.Variable 来存储模型参数。

The :0 is not in any way the layer number. :0绝不是层号。 It is used to represent output tensor of an op in the underlying API.它用于表示底层 API 中操作的输出张量。

For example, tf.Variable allocates one tensor [:0] , whereas 3-way split via tf.split allocates three tensors [:0,:1,:2] for their respective op in computational graph例如, tf.Variable分配一个张量[:0] ,而通过tf.split的 3-way split 分配三个张量[:0,:1,:2]用于计算图中各自的操作

tf.Variable([1])
# has output
# <tf.Variable 'Variable:0' shape=(1,) dtype=int32, numpy=array([1], dtype=int32)>

and

tf.compat.v1.disable_eager_execution()
a,b,c = tf.split([1,1,1], 3)
print(a.name)   # split:0
print(b.name)   # split:1
print(c.name)   # split:3

Refer to this post参考这篇文章

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM