简体   繁体   English

卷积生成对抗网络判别器的output是如何工作的,可以有全连接层吗?

[英]How does the output of the Discriminator of a Convolutional Generative Adversarial Network work, can it have a Fully Connected Layer?

I'm building a DCGAN, and I am having a problem with the shape of the output, it is not matching the shape of the labels when I try calculating the BCELoss.我正在构建一个 DCGAN,我遇到了 output 的形状问题,当我尝试计算 BCELoss 时,它与标签的形状不匹配。

To generate the discriminator output, do I have to use convolutions all the way down or can I add a Linear layer at some point to match the shape I want?要生成鉴别器 output,我是否必须一直使用卷积,或者我可以在某个点添加一个线性层以匹配我想要的形状?

I mean, do I have to reduce the shape by adding more convolutional layers or can I add a fully connected one?我的意思是,我是否必须通过添加更多卷积层来减少形状,或者我可以添加一个完全连接的层? I thought it should have a fully connected layer, but on every tutorial I checked the discriminator had no fully connected layer.我认为它应该有一个全连接层,但在每个教程中我检查了鉴别器没有全连接层。

import random
import torch.nn as nn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as torch_dataset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML

seed = 1
print("Random Seed: ", seed)
random.seed(seed)
torch.manual_seed(seed)
images_folder_path = "./spectrograms/"

batch_size = 1
image_size = 256
n_channels = 1
z_vector = 100
n_features_generator = 32
n_features_discriminator = 32
num_epochs = 5
lr = 0.0002
beta1 = 0.5

dataset = torch_dataset.ImageFolder(
    root=images_folder_path, transform=transforms.Compose(
        [
            transforms.Grayscale(num_output_channels=1),
            transforms.Resize(image_size),
            transforms.CenterCrop(image_size),
            transforms.ToTensor(),
            transforms.Normalize(0.5, 0.5)
         ]
    )
)

dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0)

device = torch.device("cuda:0" if (torch.cuda.is_available()) else "cpu")


def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(z_vector, n_features_generator * 8, 4, 1, bias=False),
            nn.BatchNorm2d(n_features_generator * 8),
            nn.ReLU(True),
            nn.ConvTranspose2d(n_features_generator * 8, n_features_generator * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_generator * 4),
            nn.ReLU(True),
            nn.ConvTranspose2d(n_features_generator * 4, n_features_generator * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_generator * 2),
            nn.ReLU(True),
            nn.ConvTranspose2d(n_features_generator * 2, n_features_generator, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_generator),
            nn.ReLU(True),
            nn.ConvTranspose2d(n_features_generator, n_channels, 4, 2, 1, bias=False),
            nn.Tanh()
        )

    def forward(self, inputs):
        return self.main(inputs)

# Convolutional Layer Output Shape = [(W−K+2P)/S]+1
# W is the input volume
# K is the Kernel size
# P is the padding
# S is the stride
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(n_channels, n_features_discriminator, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(n_features_discriminator, n_features_discriminator * 2, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_discriminator * 2),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(n_features_discriminator * 2, n_features_discriminator * 4, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_discriminator * 4),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(n_features_discriminator * 4, n_features_discriminator * 8, 4, 2, 1, bias=False),
            nn.BatchNorm2d(n_features_discriminator * 8),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(n_features_discriminator * 8, 1, 4, 1, bias=False),
        )

    def forward(self, inputs):
        return self.main(inputs)


netG = Generator().to(device)
if device.type == 'cuda':
    netG = nn.DataParallel(netG)
netG.apply(weights_init)
print(netG)

netD = Discriminator().to(device)
if device.type == 'cuda':
    netD = nn.DataParallel(netD)
netD.apply(weights_init)
print(netD)

criterion = nn.BCEWithLogitsLoss()

fixed_noise = torch.randn(64, z_vector, 1, 1, device=device)

real_label = 1.
fake_label = 0.

optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

img_list = []
G_losses = []
D_losses = []
iters = 0

print("Starting Training Loop...")
for epoch in range(num_epochs):
    for i, data in enumerate(dataloader, 0):
        netD.zero_grad()
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
        output = netD(real_cpu)
        print(output.shape)
        print(label.shape)
        output = output.view(-1)
        errD_real = criterion(output, label)
        errD_real.backward()
        D_x = output.mean().item()

        noise = torch.randn(b_size, z_vector, 1, 1, device=device)
        fake = netG(noise)
        label.fill_(fake_label)
        output = netD(fake.detach()).view(-1)
        errD_fake = criterion(output, label)
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        errD = errD_real + errD_fake
        optimizerD.step()

        netG.zero_grad()
        label.fill_(real_label)
        output = netD(fake).view(-1)
        errG = criterion(output, label)
        errG.backward()
        D_G_z2 = output.mean().item()
        optimizerG.step()

        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(dataloader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))

        G_losses.append(errG.item())
        D_losses.append(errD.item())

        if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
            with torch.no_grad():
                fake = netG(fixed_noise).detach().cpu()
            img_list.append(vutils.make_grid(fake, padding=2, normalize=True))

        iters += 1

The error I'm getting:我得到的错误:

Traceback (most recent call last):
  File "G:/Pastas Estruturadas/Conhecimento/CEFET/IA/SpectroGAN/dcgan.py", line 140, in <module>
    errD_real = criterion(output, label)
  File "C:\Users\Ramon\anaconda3\envs\vision\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Ramon\anaconda3\envs\vision\lib\site-packages\torch\nn\modules\loss.py", line 631, in forward
    reduction=self.reduction)
  File "C:\Users\Ramon\anaconda3\envs\vision\lib\site-packages\torch\nn\functional.py", line 2538, in binary_cross_entropy_with_logits
    raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
ValueError: Target size (torch.Size([1])) must be the same as input size (torch.Size([169]))

The shape of output: torch.Size([1, 1, 13, 13]) , and shape of label: torch.Size([1]) . output 的形状: torch.Size([1, 1, 13, 13]) ,label 的形状: torch.Size([1])

The DCGAN described a concrete architecture where Conv layers were used for the downsampling of the feature maps. DCGAN 描述了一个具体的架构,其中 Conv 层用于对特征图进行下采样。 If you carefully design your Conv layers, you can do without a Linear layer but that does not mean that it will not work when you use a Linear layer to downsample (especially as the very last layer).如果您仔细设计您的 Conv 层,您可以不使用线性层,但这并不意味着当您使用线性层进行下采样(尤其是作为最后一层)时它不起作用。 The DCGAN paper just found out it worked better to use Conv layers instead of Linear to downsample. DCGAN 论文刚刚发现使用 Conv 层而不是 Linear 来进行下采样效果更好。

If you want to maintain this architecture, you can change the kernel size or padding or stride to give you exactly a single value in the last layer.如果您想维护此架构,您可以更改 kernel 大小或填充或步幅,以在最后一层为您提供准确的单个值。 Refer to the Pytorch documentation on Conv layers to see what the output size should be, given an input size请参阅有关 Conv 层的 Pytorch 文档,以查看 output 的大小应该是多少,给定输入大小

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 生成对抗网络中的判别器损失不变 - Discriminator Loss Not Changing in Generative Adversarial Network 无意的 Output 形状:生成对抗网络 - Unintentional Output Shape: Generative Adversarial Network 为什么在最终的 softmax 层之前移除全连接层后,我的卷积神经网络的准确率会提高? - Why does the accuracy of my convolutional neural network increase after removing the fully connected layer before the final softmax layer? 如何在 keras 中实现生成对抗网络 (GAN) 的交叉验证? - How to implement cross validation for a Generative Adversarial Network (GAN) in keras? 如何实现以非全连接层作为最后一层的神经网络? - How to implement a neural network with a not-fully-connected layer as the final layer? 如何将不同形状的卷积层输出合并为固定形状以传递给全连接层 - how to pool different shaped convolutional layer outputs to a fixed shape to pass for Fully connected layer 博弈论在生成对抗网络中的应用在哪里 - Where is the use of Game theory in Generative Adversarial Network 完全连接的层输出ValueError - Fully connected layer output ValueError 建立一个只有全连接层(卷积层)的残差网络有意义吗? - Does it make sense to build a residual network with only fully connected layers (instedad of convolutional layers)? 为什么即使输入的维数较大,keras神经网络中最后一个完全连接/密集的层也希望具有2个暗角? - Why does the last fully-connected/dense layer in a keras neural network expect to have 2 dim even if its input has more dimensions?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM