简体   繁体   English

Tensorflow 图节点交换

[英]Tensorflow graph nodes are exchange

I have trained a model with fine-tuning pre-trained model ssd_mobilenet_v2_coco_2018 .我已经通过微调预训练 model ssd_mobilenet_v2_coco_2018 Here, I have used the exact same pipeline.config file for training which is available inside ssd_mobilenet_v2_coco_2018 pre-trained folder.在这里,我使用了完全相同的 pipeline.config 文件进行训练,该文件位于ssd_mobilenet_v2_coco_2018预训练文件夹中。 I have only removed the batch_norm_trainable: true flag and changed the number of classes (4).我只删除了batch_norm_trainable: true标志并更改了类数(4)。 After training the model with my custom datasets with 4 classes, I found concat and concat_1 nodes get exchange with each other.在使用具有 4 个类的自定义数据集训练 model 后,我发现concatconcat_1节点相互交换。 Pre-trained model has | concat | 1x1917x1x4 |预训练的 model 有| concat | 1x1917x1x4 | | concat | 1x1917x1x4 | after-training it becomes | concat | 1x1917x5 |训练后就变成了| concat | 1x1917x5 | | concat | 1x1917x5 | I have attached both tensorboard graph visualisation images.我附上了两个张量板图形可视化图像。 First image is pre-trained graph ssd_mobilenet_v2_coco_2018 .第一张图片是预训练图ssd_mobilenet_v2_coco_2018 在此处输入图像描述 在此处输入图像描述

The node exchanges can be seen on the rightmost corner of the image.节点交换可以在图像的最右角看到。 As in the pre-trained graph, Postprocess layer connect with concat_1 and Squeeeze connect with concat .与预训练图一样, Postprocess layerconcat_1连接, Squeeezeconcat连接。 But after the training, the graph shows completely reverse.但是在训练之后,图表显示完全相反。 Like Prosprocess layer connect with concat and Squeeeze connect with concat_1 .Prosprocess layer连接concatSqueeeze连接concat_1 Further, I also found in the pre-trained model graph that the Preprocessor takes input ToFloat while after training the graph shows Cast as an input to Preprocessor .此外,我还在预训练的 model 图中发现Preprocessor接受输入ToFloat ,而在训练后,该图显示 Cast 作为Preprocessor的输入。 I have fed the input to the model as tfrecords .我已将输入作为 tfrecords 提供给tfrecords

Most probably, the difference is not in the graph, but simply in the names of the nodes, ie nodes concat and concat_1 on the left are the same nodes as resp.最有可能的是,区别不在于图形,而只是节点的名称,即左侧的节点concatconcat_1与 resp 是相同的节点。 concat_1 and concat on the right. concat_1concat在右边。

The thing is, when you don't provide an explicit name to a node, tensorflow needs to come up with one, and it's naming convention is rather uninventive.问题是,当您没有为节点提供明确的名称时,tensorflow 需要提出一个,而且它的命名约定相当缺乏创造性。 The first time it needs to name a node, it does so with its type.第一次需要命名一个节点时,它会使用它的类型。 When it encounter the situation again, it simply add _ + an increasing number to the name.当它再次遇到这种情况时,它只是在名称中添加_ +一个递增的数字。

Take this example:举个例子:

import tensorflow as tf

x = tf.placeholder(tf.float32, (1,), name='x')
y = tf.placeholder(tf.float32, (1,), name='y')
z = tf.placeholder(tf.float32, (1,), name='z')

xy = tf.concat([x, y], axis=0)  # named 'concat'
xz = tf.concat([x, z], axis=0)  # named 'concat_1'

The graph looks like this:该图如下所示:

在此处输入图像描述

Now if we construct the same graph, but this time creating xz before xy , we get the following graph:现在,如果我们构建相同的图,但这次在xy之前创建xz ,我们将得到以下图:

在此处输入图像描述

So the graph did not really change -- only the names did.所以图表并没有真正改变——只有名字改变了。 This is probably what happened in your case: the same operations were created but not in the same order.这可能是您的情况发生的情况:创建了相同的操作,但顺序不同。

The fact that names changed for stateless nodes like concat is unimportant, because no weights will be misrouted when loading a saved model for example.concat这样的无状态节点的名称更改这一事实并不重要,因为例如在加载保存的 model 时,不会错误路由权重。 Nonetheless, if naming stability is important for you, you could either give explicit names to your operations or place them in distinct scopes:尽管如此,如果命名稳定性对您很重要,您可以为您的操作提供明确的名称或将它们放在不同的范围内:

xy = tf.concat([x, y], axis=0, name='xy')
xz = tf.concat([x, z], axis=0, name='xz')

在此处输入图像描述

It is much more problematic if variables switch name.如果变量切换名称,问题就会大得多。 This is one of the reason why tf.get_variable -- which forces variables to have a name and raises an error when a name conflict occurs -- was the preferred way of dealing with variables in the pre-TF2 era.这就是为什么tf.get_variable (它强制变量具有名称并在发生名称冲突时引发错误)是 TF2 之前时代处理变量的首选方式的原因之一。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM