I am trying to build a basic/shallow CNN auto-encoder for 1D time series data in pytorch/pytorch-lightning.
Currently, my encoding block is:
class encodingBlock(nn.Module):
def __init__(self):
super().__init__()
self.conv1d_1 = nn.Conv1d(1, 64, kernel_size=32)
self.relu = nn.ReLU()
self.batchnorm = nn.BatchNorm1d(64)
self.maxpool = nn.MaxPool1d(kernel_size=2, stride=2, return_indices=True)
self.fc = nn.Linear(64, 4)
def forward(self, x):
cnn_out1 = self.conv1d_1(x)
norm_out1 = self.batchnorm(cnn_out1)
relu_out1 = self.relu(norm_out1)
maxpool_out, indices = self.maxpool(relu_out1)
gap_out = torch.mean(maxpool_out, dim = 2)
fc_out = self.relu(self.fc(gap_out))
return fc_out, indices
And my decoding block is:
class decodingBlock(nn.Module):
def __init__(self):
super().__init__()
self.Tconv1d_1 = nn.ConvTranspose1d(64, 1, kernel_size=32, output_padding=1)
self.relu = nn.ReLU()
self.batchnorm = nn.BatchNorm1d(1)
self.maxunpool = nn.MaxUnpool1d(kernel_size=2, stride=2)
self.upsamp = nn.Upsample(size=59, mode='nearest')
self.fc = nn.Linear(4, 64)
def forward(self, x, indices):
fc_out = self.fc(x)
relu_out = self.relu(fc_out)
relu_out = relu_out.unsqueeze(dim = 2)
upsamp_out = self.upsamp(relu_out)
maxpool_out = self.maxunpool(upsamp_out, indices)
cnnT_out = self.Tconv1d_1(maxpool_out)
norm_out = self.batchnorm(cnnT_out)
relu_out = self.relu(norm_out)
return relu_out
However, looking at the outputs:
Input size: torch.Size([1, 1, 150])
Conv1D out size: torch.Size([1, 64, 119])
Maxpool out size: torch.Size([1, 64, 59])
Global average pooling out size: torch.Size([1, 64])
Encoder dense out size: torch.Size([1, 4])
...
Decoder input: torch.Size([1, 4])
Decoder dense out size: torch.Size([1, 64])
Unsqueeze out size: torch.Size([1, 64, 1])
Upsample out size: torch.Size([1, 64, 59])
Decoder maxunpool out size: torch.Size([1, 64, 118])
Transpose Conv out size: torch.Size([1, 1, 149])
The outputs from the MaxUnpool1d and ConvTranspose1d layers are not the expected dimension.
I have two questions that I was hoping to get some help on:
1. Regarding input and output shapes:
pytorch 's doc has the explicit formula relating input and output sizes. For convolution :
Similarly for pooling :
For transposed convolution :
And for unpooling :
Make sure your padding and output_padding
values add up to the proper output shape.
2. Is there a better way?
Transposed convolution has its faults, as you already noticed. It also tends to produce "checkerboard artifacts" .
One solution is to use pixelshuffle
: that is, predict for each low-res point twice the number of channels, and then split them into two points with the desired number of features.
Alternatively, you can interpolate
using a fixed method from the low resolution to the higher one. Apply regular convolutions to the upsampled vectors. If you choose this path, you might consider using ResizeRight
instead of pytorch's interpolate - it has better handling of edge cases.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.