简体   繁体   中英

Pytorch identifying batch size as number of channels in Conv2d layer

I am a total newbie to neural networks using Pytorch to create a VAE model. I've used a bit of tensorflow before, but I have no idea what "in_channels" and "out_channels" are, as arguments to nn.Conv2d/nn.Conv1d.

Disclaimers aside, currently, my model takes in a dataloader with batch size 128 and where each input is a 248 by 46 tensor (so, a 128 x 248 x 46 tensor).

My encoder looks like this right now -- I chopped it down so I could focus on where the error was coming from.

class Encoder(nn.Module):
    def __init__(self, latent_dim):
        super(Encoder, self).__init__()
        self.latent_dim = latent_dim
        self.conv1 = nn.Conv2d(in_channels=248, out_channels=46, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))

    def forward(self, x):
        print(x.size())
        x = F.relu(self.conv1(x))
        return x

The Conv2d layer was meant to reduce the 248 by 46 input into a 50 by 46 tensor. However, I get this error:

RuntimeError: Given groups=1, weight of size [46, 248, 9, 9], expected input[1, 128, 248, 46] to have 248 channels, but got 128 channels instead

...even though I print x.size() and it displays as [torch.Size([128, 248, 46]) .

I am unsure a) why the error shows that the layer is adding on an extra dimension to x, and b) whether I am even understanding channels correctly. Should 46 be the real number of channels? Why doesn't Pytorch simply request my input size as a tuple or something, like in=(248, 46) ? Or c) if this is an issue with the way I loaded in my data to the model. I have a numpy array data of shape (-1, 248, 46) and then started training my model as follows.

tensor_data = torch.from_numpy(data)
dataset = TensorDataset(tensor_data, tensor_data)
train_dl = DataLoader(dataset, batch_size=128, shuffle=True)
...
for epoch in range(20):
     for x_train, y_train in train_loader:
          x_train = x_train.to(device).float()
          optimizer.zero_grad()
          x_pred, mu, log_var = vae(x_train)
          bce_loss = train.BCE(y_train, x_pred)
          kl_loss = train.KL(mu, log_var)
          loss = bce_loss + kl_loss
          loss.backward()
          optimizer.step()

Any thoughts appreciated!

假设您的模型采用 28*28 的单通道图像,这变为 784,这是您的 in_channel 和 out_channels 是您的模型想要预测的类数

In pytorch, nn.Conv2d assumes the input (mostly image data) is shaped like: [B, C_in, H, W] , where B is the batch size, C_in is the number of channels, H and W are the height and width of the image. The output has a similar shape [B, C_out, H_out, W_out] . Here, C_in and C_out are in_channels and out_channels , respectively. (H_out, W_out) is the output image size, which may or may not equal (H, W) , depending on the kernel size, the stride and the padding.

However, it is confusing to apply conv2d to reduce [128, 248, 46] inputs to [128, 50, 46] . Are they image data with height 248 and width 46? If so you can reshape the inputs to [128, 1, 248, 46] and use in_channels = 1 and out_channels = 1 in conv2d.

You need to add an extra dimension for the number of channels (1) with the view function. The below code will work!

class Encoder(nn.Module):
    def __init__(self):
        super(Encoder, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=(9, 9), stride=(5, 1), padding=(5, 4))

    def forward(self, x):
        print("encoder input size: "+ str(x.shape))
        # x.shape[0] is the number of samples in batches if the number of samples >1, otherwise it is the width
        # (number of samples in a batch, number of channels, width, height)
        x = x.view(x.shape[0], 1, 248,46)
        print("encoder input size after adding 1 channel to shape: "+ str(x.shape))
        x = F.relu(self.conv1(x))
        return x

# a test dataset with 128 samples, 248 width and 46 height
test_dataset = torch.rand(128,248,46)
# prints shape of dataset
test.shape

model = Encoder()
model(test_dataset)

# if you are passing only one sample to the model (i.e. to plot) you need to do this instead
test_dataset2 = torch.rand(1,248,46)
model(test_dataset2.view(test_dataset2.shape[0],1,248,46))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM