Keras 自動編碼器輸入圖像大小

Question

考慮這個自動編碼器：

import numpy as np

from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D, Flatten, Reshape
from keras.models import Model

class ConvAutoencoder:

    def __init__(self, image_size, latent_dim):

        inp = Input(shape=(image_size[0], image_size[1], 1))

        x = Conv2D(16, (3, 3), activation='relu', padding='same')(inp)
        x = MaxPooling2D((2, 2), padding='same')(x)
        x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
        x = MaxPooling2D((2, 2), padding='same')(x)
        x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
        encoded = MaxPooling2D((2, 2), padding='same')(x)
        # at this point the representation is (4, 4, 8) i.e. 128-dimensional

        d = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
        d = UpSampling2D((2, 2))(d)
        d = Conv2D(8, (3, 3), activation='relu', padding='same')(d)
        d = UpSampling2D((2, 2))(d)
        d = Conv2D(16, (3, 3), activation='relu')(d)
        d = UpSampling2D((2, 2))(d)

        decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(d)

        self.model = Model(inp, decoded)
        self.encoder = Model(inp, encoded)
        self.model.compile(loss='mse', optimizer='Adam')

        print(self.model.summary())

我實例化它

ConvAutoencoder(image_size=(32,32), latent_dim=10)

哪個打印

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 32, 32, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 16)        160       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 16)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 8)         1160      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 8)           0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 8, 8, 8)           584       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 4, 4, 8)           0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 4, 4, 8)           584       
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 8, 8, 8)           0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 8, 8, 8)           584       
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 16, 16, 8)         0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 14, 14, 16)        1168      
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 28, 28, 16)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 28, 28, 1)         145       
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________
None

如您所見，輸入圖像大小為(32,32)而輸出圖像大小為(28,28) 。
* 問題 1：如何更改自編碼器的架構，使輸出圖像大小變為(32,32) ？
* 問題 2：如您所見，該類需要一個名為latent_dim的參數。 目前，此參數未使用。 是否有一種簡單的方法可以將自動編碼器的潛在維度“強制”降低到某個數字？ 例如，在中間添加一個完全連接的層或沿着這些線添加什么？

Answer 1

問題 1

好吧，您忘記了最后一次上采樣中的padding='same' 。

它應該是這樣的

        # at this point the representation is (4, 4, 8) i.e. 128-dimensional

        d = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
        d = UpSampling2D((2, 2))(d)
        d = Conv2D(8, (3, 3), activation='relu', padding='same')(d)
        d = UpSampling2D((2, 2))(d)
        d = Conv2D(16, (3, 3), activation='relu', padding='same')(d)
        d = UpSampling2D((2, 2))(d)

        decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(d)

問題2

你是說內核嗎？ 然后呢

        x = Conv2D(latent_dim*4, (3, 3), activation='relu', padding='same')(inp)
        x = MaxPooling2D((2, 2), padding='same')(x)
        x = Conv2D(latent_dim*2, (3, 3), activation='relu', padding='same')(x)
        x = MaxPooling2D((2, 2), padding='same')(x)
        x = Conv2D(latent_dim, (3, 3), activation='relu', padding='same')(x)
        encoded = MaxPooling2D((2, 2), padding='same')(x)
        # at this point the representation is (4, 4, 8) i.e. 128-dimensional

        d = Conv2D(latent_dim, (3, 3), activation='relu', padding='same')(encoded)
        d = UpSampling2D((2, 2))(d)
        d = Conv2D(latent_dim*2, (3, 3), activation='relu', padding='same')(d)
        d = UpSampling2D((2, 2))(d)
        d = Conv2D(latent_dim*4, (3, 3), activation='relu', padding='same')(d)
        d = UpSampling2D((2, 2))(d)

但是，如果您的意思是希望中間層具有特定的內核大小，那么您可以使用這樣的MaxPooling2D將Conv2D替換為MaxPooling2D 。

encoded = Conv2D(latent_dim, (3, 3), activation='relu', padding='same', strides=2)(x)

實際上，您可以刪除所有Maxpooling2D並將Maxpooling2D strides=2添加到所有Conv2D 。

Keras 自動編碼器輸入圖像大小

問題描述

1 個解決方案

解決方案1
1 已采納 2020-01-12 15:59:39

Keras 自動編碼器輸入圖像大小

問題描述

1 個解決方案

解決方案1 1 已采納 2020-01-12 15:59:39

解決方案1
1 已采納 2020-01-12 15:59:39