卷积生成对抗网络判别器的output是如何工作的，可以有全连接层吗？

Question

I'm building a DCGAN, and I am having a problem with the shape of the output, it is not matching the shape of the labels when I try calculating the BCELoss.我正在构建一个 DCGAN，我遇到了 output 的形状问题，当我尝试计算 BCELoss 时，它与标签的形状不匹配。

To generate the discriminator output, do I have to use convolutions all the way down or can I add a Linear layer at some point to match the shape I want?要生成鉴别器 output，我是否必须一直使用卷积，或者我可以在某个点添加一个线性层以匹配我想要的形状？

I mean, do I have to reduce the shape by adding more convolutional layers or can I add a fully connected one?我的意思是，我是否必须通过添加更多卷积层来减少形状，或者我可以添加一个完全连接的层？ I thought it should have a fully connected layer, but on every tutorial I checked the discriminator had no fully connected layer.我认为它应该有一个全连接层，但在每个教程中我检查了鉴别器没有全连接层。

import random
import torch.nn as nn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as torch_dataset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML

seed = 1
print("Random Seed: ", seed)
random.seed(seed)
torch.manual_seed(seed)
images_folder_path = "./spectrograms/"

batch_size = 1
image_size = 256
n_channels = 1
z_vector = 100
n_features_generator = 32
n_features_discriminator = 32
num_epochs = 5
lr = 0.0002
beta1 = 0.5

dataset = torch_dataset.ImageFolder(
    root=images_folder_path, transform=transforms.Compose(
        [
            transforms.Grayscale(num_output_channels=1),
            transforms.Resize(image_size),
            transforms.CenterCrop(image_size),
            transforms.ToTensor(),
            transforms.Normalize(0.5, 0.5)
         ]
    )
)

dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0)

device = torch.device("cuda:0" if (torch.cuda.is_available()) else "cpu")


def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(z_vector, n_features_generator * 8, 4, 1, bias=False),
            nn.BatchNorm2d(n_features_generator * 8),
            nn.ReLU(True),
            nn.ConvTranspose2d(n_features_generator * 8, n_features_generator * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_generator * 4),
            nn.ReLU(True),
            nn.ConvTranspose2d(n_features_generator * 4, n_features_generator * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_generator * 2),
            nn.ReLU(True),
            nn.ConvTranspose2d(n_features_generator * 2, n_features_generator, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_generator),
            nn.ReLU(True),
            nn.ConvTranspose2d(n_features_generator, n_channels, 4, 2, 1, bias=False),
            nn.Tanh()
        )

    def forward(self, inputs):
        return self.main(inputs)

# Convolutional Layer Output Shape = [(W−K+2P)/S]+1
# W is the input volume
# K is the Kernel size
# P is the padding
# S is the stride
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(n_channels, n_features_discriminator, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(n_features_discriminator, n_features_discriminator * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_discriminator * 2),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(n_features_discriminator * 2, n_features_discriminator * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_discriminator * 4),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(n_features_discriminator * 4, n_features_discriminator * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_discriminator * 8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(n_features_discriminator * 8, 1, 4, 1, bias=False),
        )

    def forward(self, inputs):
        return self.main(inputs)


netG = Generator().to(device)
if device.type == 'cuda':
    netG = nn.DataParallel(netG)
netG.apply(weights_init)
print(netG)

netD = Discriminator().to(device)
if device.type == 'cuda':
    netD = nn.DataParallel(netD)
netD.apply(weights_init)
print(netD)

criterion = nn.BCEWithLogitsLoss()

fixed_noise = torch.randn(64, z_vector, 1, 1, device=device)

real_label = 1.
fake_label = 0.

optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

img_list = []
G_losses = []
D_losses = []
iters = 0

print("Starting Training Loop...")
for epoch in range(num_epochs):
    for i, data in enumerate(dataloader, 0):
        netD.zero_grad()
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
        output = netD(real_cpu)
        print(output.shape)
        print(label.shape)
        output = output.view(-1)
        errD_real = criterion(output, label)
        errD_real.backward()
        D_x = output.mean().item()

        noise = torch.randn(b_size, z_vector, 1, 1, device=device)
        fake = netG(noise)
        label.fill_(fake_label)
        output = netD(fake.detach()).view(-1)
        errD_fake = criterion(output, label)
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        errD = errD_real + errD_fake
        optimizerD.step()

        netG.zero_grad()
        label.fill_(real_label)
        output = netD(fake).view(-1)
        errG = criterion(output, label)
        errG.backward()
        D_G_z2 = output.mean().item()
        optimizerG.step()

        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(dataloader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))

        G_losses.append(errG.item())
        D_losses.append(errD.item())

        if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
            with torch.no_grad():
                fake = netG(fixed_noise).detach().cpu()
            img_list.append(vutils.make_grid(fake, padding=2, normalize=True))

        iters += 1

The error I'm getting:我得到的错误：

Traceback (most recent call last):
  File "G:/Pastas Estruturadas/Conhecimento/CEFET/IA/SpectroGAN/dcgan.py", line 140, in <module>
    errD_real = criterion(output, label)
  File "C:\Users\Ramon\anaconda3\envs\vision\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Ramon\anaconda3\envs\vision\lib\site-packages\torch\nn\modules\loss.py", line 631, in forward
    reduction=self.reduction)
  File "C:\Users\Ramon\anaconda3\envs\vision\lib\site-packages\torch\nn\functional.py", line 2538, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([1])) must be the same as input size (torch.Size([169]))

The shape of output: torch.Size([1, 1, 13, 13]) , and shape of label: torch.Size([1]) . output 的形状： torch.Size([1, 1, 13, 13]) ，label 的形状： torch.Size([1]) 。

Answer 1

The DCGAN described a concrete architecture where Conv layers were used for the downsampling of the feature maps. DCGAN 描述了一个具体的架构，其中 Conv 层用于对特征图进行下采样。 If you carefully design your Conv layers, you can do without a Linear layer but that does not mean that it will not work when you use a Linear layer to downsample (especially as the very last layer).如果您仔细设计您的 Conv 层，您可以不使用线性层，但这并不意味着当您使用线性层进行下采样（尤其是作为最后一层）时它不起作用。 The DCGAN paper just found out it worked better to use Conv layers instead of Linear to downsample. DCGAN 论文刚刚发现使用 Conv 层而不是 Linear 来进行下采样效果更好。

If you want to maintain this architecture, you can change the kernel size or padding or stride to give you exactly a single value in the last layer.如果您想维护此架构，您可以更改 kernel 大小或填充或步幅，以在最后一层为您提供准确的单个值。 Refer to the Pytorch documentation on Conv layers to see what the output size should be, given an input size请参阅有关 Conv 层的 Pytorch 文档，以查看 output 的大小应该是多少，给定输入大小

卷积生成对抗网络判别器的output是如何工作的，可以有全连接层吗？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-14 03:36:01

卷积生成对抗网络判别器的output是如何工作的，可以有全连接层吗？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-14 03:36:01

解决方案1
1 已采纳 2021-03-14 03:36:01