简体   繁体   中英

Why does Tensorflow have different dimensions for its Conv2D weights then Pytorch does?

Im working on converting and old project written in tensorflow v1.13 to pytorch v1.4.0 when I noticed that tensorflow and pytorch had different size weight tensors for the 2d cnns.

Here is my tensorflow code

cnn = tf.layers.conv2d(img_tensor, 16, (3, 3), (1, 1), padding='SAME', name='cnn_1')
cnn = tf.layers.conv2d(cnn, 32, (3, 3), (1, 1), padding='SAME', name='cnn_2')

init = tf.global_varaibles_initializer()
with tf.Session() as sess:
   sess.run(init)
   vars = {v.name:v for v in tf.trainable_variables()}
   print(sess.run(vars['cnn_2/kernel:0']).shape)

Result

(3, 3, 1, 32)

Here is my pytorch code

class Net(Module):
   def __init__(self):
      super(Net, self).__init__()
      self.create_cnn()

   def create_cnn(self):
      self.cnn_layers = Sequential(
         Conv2d(1,16,3,padding=1)
         Conv2d(16,32,3,padding=1)
      )

   def forward(self, x):
      return self.cnn_layers(x)

def weights_init(m):
   if type(m) == Conv2d:
      if(m.bias.shape[0] == 32):
         print(m.weight.data.shape)

model = Net()
model.apply(weights_init)

Result

torch.Size([32,16,3,3])

The reason this came up was because my pytorch model is not working so I started going a layer at a time and comparing outputs between tensorflow and pytorch. In order for that to work I had to set the weights on both models to the same values. Well I got the 2nd cnn layer and was confused when it failed to set the weights because the size was wrong. A little bit of poking around and I found this difference.

I looks like tensorflow is using the same kernel across all the channels where pytorch has a unique kernel for each channel. If this is the case, how can I replicate this in pytorch?

After re-reading the pytorch docs I noticed that the groups property is exactly related to this. That'll teach me not to skim over parts of the docs. By setting groups=in_channels I now get the size (32, 1, 3, 3) as desired.

Edit: So even more embarrassing, in my test code I was feeding my inputs into both cnn layers instead of daisy chaining them. When I actually run the code as written above the second cnn in tensorflow does infact have weights with size (3, 3, 16, 32).

But at least I learned about grouping.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM