[英]Same weights, implementation but different results n Keras and Pytorch
I have an encoder and a decoder model ( monodepth2 ).我有一个编码器和一个解码器 model ( monodepth2 )。 I try convert them from Pytorch to Keras using
Onnx2Keras
, but:我尝试使用 Onnx2Keras 将它们从 Pytorch 转换为
Onnx2Keras
,但是:
TF2.3
), and copy the weights (numpy array, including weight and bias) for each layer from Pytorch to Keras, without any modification.TF2.3
)中构建了解码器,并将每一层的权重(numpy 数组,包括权重和偏差)从 Pytorch 复制到 Keras,无需任何修改。 But it turns out both Onnx2Keras
-converted Encoder and self-built Decoder fails to reproduce the same results.但事实证明
Onnx2Keras
转换的编码器和自建解码器都无法重现相同的结果。 The cross-comparison pictures are below, but I'd first introduce the code of Decoder .下面是交叉比较的图片,但我先介绍一下Decoder的代码。
First the core Layer, all the conv2d layer ( Conv3x3
, ConvBlock
) is based on this, but different dims or add an activation:首先是核心层,所有的conv2d层(
Conv3x3
, ConvBlock
)都是基于此,但不同的暗淡或添加激活:
# Conv3x3 (normal conv2d without BN nor activation)
# There's also a ConvBlock, which is just "Conv3x3 + ELU activation", so I don't list it here.
def TF_Conv3x3(input_channel, filter_num, pad_mode='reflect', activate_type=None):
# Actually it's 'reflect, but I implement it with tf.pad() outside this
padding = 'valid'
# if TF_ConvBlock, then activate_type=='elu
conv = tf.keras.layers.Conv2D(filters=filter_num, kernel_size=3, activation=activate_type,
strides=1, padding=padding)
return conv
Then the structure.然后是结构。 Note that the definition is EXACTLY the same as the original code .
请注意,定义与原始代码完全相同。 I think it must be some details about the implementation.
我认为它必须是有关实施的一些细节。
def DepthDecoder_keras(num_ch_enc=np.array([64, 64, 128, 256, 512]), channel_first=False,
scales=range(4), num_output_channels=1):
num_ch_dec = np.array([16, 32, 64, 128, 256])
convs = OrderedDict()
for i in range(4, -1, -1):
# upconv_0
num_ch_in = num_ch_enc[-1] if i == 4 else num_ch_dec[i + 1]
num_ch_out = num_ch_dec[i]
# convs[("upconv", i, 0)] = ConvBlock(num_ch_in, num_ch_out)
convs[("upconv", i, 0)] = TF_ConvBlock(num_ch_in, num_ch_out, pad_mode='reflect')
# upconv_1
num_ch_in = num_ch_dec[i]
if i > 0:
num_ch_in += num_ch_enc[i - 1]
num_ch_out = num_ch_dec[i]
convs[("upconv", i, 1)] = TF_ConvBlock(num_ch_in, num_ch_out, pad_mode='reflect') # Just Conv3x3 with ELU-activation
for s in scales:
convs[("dispconv", s)] = TF_Conv3x3(num_ch_dec[s], num_output_channels, pad_mode='reflect')
"""
Input_layer dims: (64, 96, 320), (64, 48, 160), (128, 24, 80), (256, 12, 40), (512, 6, 20)
"""
x0 = tf.keras.layers.Input(shape=(96, 320, 64))
# then define the the rest input layers
input_features = [x0, x1, x2, x3, x4]
"""
# connect layers
"""
outputs = []
ch = 1 if channel_first else 3
x = input_features[-1]
for i in range(4, -1, -1):
x = tf.pad(x, paddings=[[0, 0], [1, 1], [1, 1], [0, 0]], mode='REFLECT')
x = convs[("upconv", i, 0)](x)
x = [tf.keras.layers.UpSampling2D()(x)]
if i > 0:
x += [input_features[i - 1]]
x = tf.concat(x, ch)
x = tf.pad(x, paddings=[[0, 0], [1, 1], [1, 1], [0, 0]], mode='REFLECT')
x = convs[("upconv", i, 1)](x)
x = TF_ReflectPad2D_1()(x)
x = convs[("dispconv", 0)](x)
disp0 = tf.math.sigmoid(x)
"""
build keras Model ([input0, ...], [output0, ...])
"""
# decoder = tf.keras.Model(input_features, outputs)
decoder = tf.keras.Model(input_features, disp0)
return decoder
The cross-comparison is as follows... I would really appreciate it if anyone could offer some insights.交叉比较如下......如果有人能提供一些见解,我将不胜感激。 Thanks!!!
谢谢!!!
Original results:原始结果:
Original Encoder + Self-build Decoder:原编码器+自建解码器:
ONNX-converted Enc + Original Dec (Texture is good, but the contrast is not enough, the car should be very close, ie very bright color): ONNX-converted Enc + Original Dec(质感不错,但对比度不够,车应该很接近,即颜色很亮):
Solved!解决了!
It turns out there's indeed no problem with implementation (at least not significant ones).事实证明,实施确实没有问题(至少不是重要的问题)。 It's the problem with
weights
copying.这是
weights
复制的问题。
The original weights has (H, W, 3, 3), but TF-model requires dim of (3, 3, W, H), so I permuted it by [3,2,1,0], overlooking the (3, 3) also have their own sequence.原始权重有 (H, W, 3, 3),但 TF 模型需要 (3, 3, W, H) 的暗淡,所以我将其置换为 [3,2,1,0],忽略了 (3 , 3) 也有自己的顺序。
So it should be weights.permute([2,3,1,0])
, and all is well!所以应该是
weights.permute([2,3,1,0])
,一切都很好!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.