Keras 致密層 Output 形狀

Question

我無法理解獲得第一個隱藏層的 output 形狀背后的邏輯。 我舉了一些隨意的例子如下；

示例 1：

model.add(Dense(units=4,activation='linear',input_shape=(784,)))  
model.add(Dense(units=10,activation='softmax'))
model.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_7 (Dense)              (None, 4)                 3140      
_________________________________________________________________
dense_8 (Dense)              (None, 10)                50        
=================================================================
Total params: 3,190
Trainable params: 3,190
Non-trainable params: 0

示例 2：

model.add(Dense(units=4,activation='linear',input_shape=(784,1)))   
model.add(Dense(units=10,activation='softmax'))
model.summary()
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_11 (Dense)             (None, 784, 4)            8         
_________________________________________________________________
dense_12 (Dense)             (None, 784, 10)           50        
=================================================================
Total params: 58
Trainable params: 58
Non-trainable params: 0

示例 3：

model.add(Dense(units=4,activation='linear',input_shape=(32,28)))    
model.add(Dense(units=10,activation='softmax'))
model.summary()
Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_15 (Dense)             (None, 32, 4)             116       
_________________________________________________________________
dense_16 (Dense)             (None, 32, 10)            50        
=================================================================
Total params: 166
Trainable params: 166
Non-trainable params: 0

示例 4：

model.add(Dense(units=4,activation='linear',input_shape=(32,28,1)))    
model.add(Dense(units=10,activation='softmax'))
model.summary()
Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_17 (Dense)             (None, 32, 28, 4)         8         
_________________________________________________________________
dense_18 (Dense)             (None, 32, 28, 10)        50        
=================================================================
Total params: 58
Trainable params: 58
Non-trainable params: 0

請幫助我理解邏輯。

另外，我認為input_shape=(784,)和input_shape=(784,1)的等級相同，那么為什么它們的Output Shape不同？

Answer 1

According to the official documentation of Keras, for Dense layer when you give input as input_shape=(input_units,) the modal take as input arrays of shape (*, input_units) and outputs arrays of shape (*, output_units) [in your case input_shape=(784,)被視為input shape=(*, 784)並且 output 是output_shape=(*,4) ]

一般來說，對於(batch_size, ..., input_dim)的輸入維度，模態會給出大小為(batch_size, ..., units)的 output。

因此，當您將輸入作為input_shape=(784,)時，模態將作為輸入 arrays 形狀(*, 784) ，其中*是批量大小， 784作為 input_dim，將 output 形狀作為(*, 4) 。

當輸入為(784,1)時，模態將其視為(*, 784, 1)其中*是批量大小， 784是...並且1是 input_dim => (batch_size, ..., input_dim)和output 為(*, 784, 4) => (batch_size, ..., units) 。

input_shape=(32,28)=>(*,32,28)也是如此，給出 output (*,32,4)和input_shape=(32,28,1)=>(*,32,28,1)其中*是 batch_size, 32,28是...而1是 input_dim => (batch_size, ..., input_dim)

關於None是什么意思，請查看KERAS的model.summary中的“None”是什么意思？

Answer 2

邏輯很簡單：dense layer獨立應用於前一層的最后一個維度。 因此，形狀為(d1, ..., dn, d)的輸入通過具有m個單元的密集層會導致形狀為(d1, ..., dn, m)的 output ，並且該層具有d*m+m個參數（ m個偏差）。

請注意，相同的權重是獨立應用的，因此您的示例 4 的工作原理如下：

for i in range(32):
    for j in range(28):
        output[i, j, :] = input[i, j, :] @ layer.weights + layer.bias

其中@是矩陣乘法。 input[i, j]是形狀為(1,)的向量， layer.weights的大小為(1,4) ， layer.bias是(1,)的向量。

這也解釋了為什么(784,)和(784,1)給出不同的結果：它們的最后一個維度是不同的，784 和 1。

Answer 3

密集層需要輸入為（batch_size，input_size），大多數時候我們跳過batch_size並在訓練期間定義它。

如果您的輸入形狀是一維的，在您的第一種情況下 (784,) model 將作為輸入 arrays 形狀 (~, 784) 和 Z78E6221F6393D1356681DB398F5399D8CZ 數組形狀 ( 默認情況下，它將添加 4 的偏差（因為 4 個單位）。所以總參數將是

parameters -> 784*4 + 4 = 3140

如果您的輸入形狀是二維的，在第二種情況下 (784,1) model 將作為輸入 arrays 形狀 (784,1) 和 Z78E6221F6393D1356681DB398F1784CDZ 數組 (one)。 None是批次維度。 默認情況下，它將添加 4 的偏差（因為 4 個單位）。所以總參數將是

parameters -> 4(output units) + 4(bias) = 8

Answer 4

Output 層的形狀取決於所用層的類型。 例如， Dense層的 output 形狀基於層中定義的units ，其中卷積層的Conv形狀取決於filters 。

要記住的另一件事是，默認情況下，任何輸入的最后一個維度都被視為通道數。 在 output 形狀估計的過程中，通道數被層中定義的units所取代。 對於像input_shape=(784,)這樣的一維輸入，最后使用,很重要。

示例 1（一維）、示例 2（二維，通道 =1）、示例 3（二維，通道 =28）和示例 4（3 維，通道 =1）。 如上所述，最后一個維度被Dense層中定義的units替換。

在這個stackoverflow答案中非常清楚地提到了有關維度、軸、通道、input_dim 等的更多細節。

Answer 5

keras 是一個高級別的 api，它處理了很多抽象。 以下示例可能會幫助您更好地理解。 在您的問題中，它是最接近 keras 抽象的原始 tensorflow 等效項：

import tensorflow as tf
from pprint import pprint


for shape in [(None,784,), (None, 784,1), (None, 32,28), (None, 32,28,1)]:
    shapes_list = []

    input_layer_1 = tf.compat.v1.placeholder(dtype=tf.float32, shape=shape, name=None)
    shapes_list.append(input_layer_1.shape)
    d1 = tf.compat.v1.layers.dense(
        inputs=input_layer_1, units=4, activation=None, use_bias=True, kernel_initializer=None,
        bias_initializer=tf.zeros_initializer(), kernel_regularizer=None,
        bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
        bias_constraint=None, trainable=True, name=None, reuse=None
    )
    shapes_list.append(d1.shape)
    d2 = tf.compat.v1.layers.dense(
        inputs=d1, units=10, activation=tf.compat.v1.nn.softmax, use_bias=True, kernel_initializer=None,
        bias_initializer=tf.zeros_initializer(), kernel_regularizer=None,
        bias_regularizer=None, activity_regularizer=None, kernel_constraint=None,
        bias_constraint=None, trainable=True, name=None, reuse=None
    )
    shapes_list.append(d2.shape)
    print('++++++++++++++++++++++++++')
    pprint(shapes_list)
    print('++++++++++++++++++++++++++')

Dense function 用於制作密集連接層或感知器。

根據您的代碼片段，您似乎已經創建了一個多層感知器（具有線性激活 function f(x)=x），其中隱藏層 1 有 4 個神經元，output 層為要預測的 10 個類/標簽定制。

每層中的神經元數量由單位參數確定。 而 layer_L 中每個神經元的 Shape 由前一個layer_L-1的output決定。

如果一個密集層的輸入是(BATCH_SIZE, N, l) ，那么 output 的形狀將是(BATCH_SIZE, N, value_passed_to_argument_units_in_Dense)

如果輸入是(BATCH_SIZE, N, M, l) ，那么 output 形狀是(BATCH_SIZE, N, M, value_passed_to_argument_units_in_Dense)等等。

筆記：

這僅在Dense神經元的情況下發生，因為它不會改變 batch_size 和 last_channel 之間的中間維度。

然而，在其他神經元如Conv2D->(Max/Avg)pooling的情況下，中間維度可能（取決於傳遞的 arguments）也會發生變化，因為這些神經元也作用於這些維度。

Answer 6

根據 keras

Dense layer is applied on the last axis independently. [1]

https://github.com/keras-team/keras/issues/10736#issuecomment-406589140

第一個例子：

input_shape=(784,)
model.add(Dense(units=4,activation='linear',input_shape=(784,)))

它說輸入只有 784 行。model 的第一層有 4 個單元。 密集層中的每個單元都連接到所有 784 行。

這就是為什么

Output shape=  (None, 4)

None 代表 batch_size，這里不知道。

第二個例子

這里輸入 2 階張量

input_shape=(784,1)
Units = 4

所以現在輸入是 784 行和 1 列。 現在，密集層的每個單元都連接到總共 784 行中的每個元素中的 1 個元素。 Output 形狀 =（無，784, 4）
沒有批量大小。

第三個例子

 input_shape=(32,28)

現在每個密集層單元都連接到 32 行中的每一個的 28 個元素。 所以

output_shape=(None,32,4)

最后一個例子

model.add(Dense(units=4,activation='linear',input_shape=(32,28,1)))

再次將密集層應用於最后一個軸，Output 形狀變為

Output Shape =(None,32,28,4)

筆記

rank 在 (784,) 處為 1，逗號不代表另一個維度。 排名是 2 在 (784,1)

stackcoverflow帖子中的圖表可以進一步幫助您。

Keras 致密層 Output 形狀

問題描述

6 個解決方案

解決方案1
3 已采納 2020-05-02 15:58:16

解決方案2
2 2020-05-02 16:10:32

解決方案3
1 2020-05-02 15:27:45

解決方案4
1 2020-05-02 16:23:39

解決方案5
1 2020-05-02 16:35:54

解決方案6
1 2020-05-02 16:50:59

Keras 致密層 Output 形狀

問題描述

6 個解決方案

解決方案1 3 已采納 2020-05-02 15:58:16

解決方案2 2 2020-05-02 16:10:32

解決方案3 1 2020-05-02 15:27:45

解決方案4 1 2020-05-02 16:23:39

解決方案5 1 2020-05-02 16:35:54

解決方案6 1 2020-05-02 16:50:59

解決方案1
3 已采納 2020-05-02 15:58:16

解決方案2
2 2020-05-02 16:10:32

解決方案3
1 2020-05-02 15:27:45

解決方案4
1 2020-05-02 16:23:39

解決方案5
1 2020-05-02 16:35:54

解決方案6
1 2020-05-02 16:50:59