简体   繁体   中英

Pytorch Conv1D gives different size to ConvTranspose1d

I am trying to build a basic/shallow CNN auto-encoder for 1D time series data in pytorch/pytorch-lightning.

Currently, my encoding block is:

class encodingBlock(nn.Module):
    def __init__(self):
        super().__init__()
                
        self.conv1d_1 = nn.Conv1d(1, 64, kernel_size=32)
        self.relu = nn.ReLU()
        self.batchnorm = nn.BatchNorm1d(64)
        self.maxpool = nn.MaxPool1d(kernel_size=2, stride=2, return_indices=True)
        self.fc = nn.Linear(64, 4)

    def forward(self, x):
        cnn_out1 = self.conv1d_1(x)
        norm_out1 = self.batchnorm(cnn_out1)
        relu_out1 = self.relu(norm_out1)
        maxpool_out, indices = self.maxpool(relu_out1)
        gap_out = torch.mean(maxpool_out, dim = 2)
        fc_out = self.relu(self.fc(gap_out))
        return fc_out, indices

And my decoding block is:

class decodingBlock(nn.Module):
    def __init__(self):
        super().__init__()
                
        self.Tconv1d_1 = nn.ConvTranspose1d(64, 1, kernel_size=32, output_padding=1)
        self.relu = nn.ReLU()
        self.batchnorm = nn.BatchNorm1d(1)
        self.maxunpool = nn.MaxUnpool1d(kernel_size=2, stride=2)
        self.upsamp = nn.Upsample(size=59, mode='nearest')
        self.fc = nn.Linear(4, 64)

    def forward(self, x, indices):
        fc_out = self.fc(x)
        relu_out = self.relu(fc_out)
        relu_out = relu_out.unsqueeze(dim = 2)
        upsamp_out = self.upsamp(relu_out)
        maxpool_out = self.maxunpool(upsamp_out, indices)
        cnnT_out = self.Tconv1d_1(maxpool_out)
        norm_out = self.batchnorm(cnnT_out)
        relu_out = self.relu(norm_out)            
        return relu_out

However, looking at the outputs:

Input size: torch.Size([1, 1, 150])
Conv1D out size: torch.Size([1, 64, 119])
Maxpool out size: torch.Size([1, 64, 59])
Global average pooling out size: torch.Size([1, 64])
Encoder dense out size: torch.Size([1, 4])
...
Decoder input: torch.Size([1, 4])
Decoder dense out size: torch.Size([1, 64])
Unsqueeze out size: torch.Size([1, 64, 1])
Upsample out size: torch.Size([1, 64, 59])
Decoder maxunpool out size: torch.Size([1, 64, 118])
Transpose Conv out size: torch.Size([1, 1, 149])

The outputs from the MaxUnpool1d and ConvTranspose1d layers are not the expected dimension.

I have two questions that I was hoping to get some help on:

  1. Why are the dimensions wrong?
  2. Is there a better way to "reverse" the global average pooling than the upsampling procedure I have used?

1. Regarding input and output shapes:
's doc has the explicit formula relating input and output sizes. For convolution : 在此处输入图片说明

Similarly for pooling : 在此处输入图片说明

For transposed convolution : 在此处输入图片说明

And for unpooling : 在此处输入图片说明

Make sure your padding and output_padding values add up to the proper output shape.

2. Is there a better way?
Transposed convolution has its faults, as you already noticed. It also tends to produce "checkerboard artifacts" .

One solution is to use pixelshuffle : that is, predict for each low-res point twice the number of channels, and then split them into two points with the desired number of features.

Alternatively, you can interpolate using a fixed method from the low resolution to the higher one. Apply regular convolutions to the upsampled vectors. If you choose this path, you might consider using ResizeRight instead of pytorch's interpolate - it has better handling of edge cases.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM