[英]Why does reshaping my data completely change the behaviour of a fully connected neural network in Keras?
I would love some insight on this.我很想对此有所了解。 I'm working on a regression problem in Keras with a simple neural network.
我正在使用简单的神经网络解决 Keras 中的回归问题。 I have train and test data, training data consists of 33230 samples with 20020 features (which is a ton of features for this amount of data, but that's another story - the features are just various measurements).
我有训练和测试数据,训练数据由 33230 个样本和 20020 个特征组成(对于这个数据量来说,这是一大堆特征,但那是另一回事——这些特征只是各种测量值)。 Test set is 8308 samples with same number of features.
测试集是具有相同数量特征的 8308 个样本。 My data is in a pandas dataframe, and I convert it into numpy arrays which seem to look as expected:
我的数据在 pandas dataframe 中,我将其转换为 numpy ZA3CBC3F9D0CE2F2C15954E1 看起来像预期的那样:DCZE1
X_train = np.array(X_train_df)
X_train.shape
(33230, 20020)
X_test = np.array(X_test_df)
X_test.shape
(8308, 20020)
If I pass this into the following fully connected model, it trains very quickly, and produces terrible results on the test set:如果我将它传递给以下完全连接的 model,它会很快训练,并在测试集上产生糟糕的结果:
Model: Model:
model = Sequential()
model.add(Dense(300, activation="relu", input_shape=(20020,)))
model.add(Dense(300, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Fit:合身:
model.fit(x=X_train, y=y_train, validation_data=(X_test, y_test), batch_size=128, shuffle=True, epochs=100)
Results after 5 epochs (doesn't change substantially after this, training loss goes down, validation loss shoots up): 5 个 epoch 后的结果(此后基本没有变化,训练损失下降,验证损失增加):
Train on 33230 samples, validate on 8308 samples
Epoch 1/100
33230/33230 [==============================] - 11s 322us/sample - loss: 217.6460 - mean_absolute_error: 9.6896 - val_loss: 92.2517 - val_mean_absolute_error: 7.6400
Epoch 2/100
33230/33230 [==============================] - 10s 308us/sample - loss: 70.0501 - mean_absolute_error: 7.0170 - val_loss: 90.1813 - val_mean_absolute_error: 7.5721
Epoch 3/100
33230/33230 [==============================] - 10s 309us/sample - loss: 62.5253 - mean_absolute_error: 6.6401 - val_loss: 104.1333 - val_mean_absolute_error: 8.0131
Epoch 4/100
33230/33230 [==============================] - 11s 335us/sample - loss: 55.6250 - mean_absolute_error: 6.2346 - val_loss: 142.8665 - val_mean_absolute_error: 9.3112
Epoch 5/100
33230/33230 [==============================] - 10s 311us/sample - loss: 51.7378 - mean_absolute_error: 5.9570 - val_loss: 208.8995 - val_mean_absolute_error: 11.4158
However if I reshape the data:但是,如果我重塑数据:
X_test = X_test.reshape(8308, 20020, 1)
X_train = X_train.reshape(33230, 20020, 1)
And then use the same model with a Flatten() after the first layer:然后在第一层之后使用相同的 model 和 Flatten():
model = Sequential()
model.add(Dense(300, activation="relu", input_shape=(20020,1)))
model.add(Flatten())
model.add(Dense(300, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Then my results look very different, and much better:然后我的结果看起来非常不同,而且好多了:
Train on 33230 samples, validate on 8308 samples
Epoch 1/100
33230/33230 [==============================] - 1117s 34ms/sample - loss: 112.4860 - mean_absolute_error: 7.5939 - val_loss: 59.3871 - val_mean_absolute_error: 6.2453
Epoch 2/100
33230/33230 [==============================] - 1112s 33ms/sample - loss: 4.7877 - mean_absolute_error: 1.6323 - val_loss: 23.8041 - val_mean_absolute_error: 3.8226
Epoch 3/100
33230/33230 [==============================] - 1116s 34ms/sample - loss: 2.3945 - mean_absolute_error: 1.1755 - val_loss: 14.9597 - val_mean_absolute_error: 2.8702
Epoch 4/100
33230/33230 [==============================] - 1113s 33ms/sample - loss: 1.5722 - mean_absolute_error: 0.9616 - val_loss: 15.0566 - val_mean_absolute_error: 2.9075
Epoch 5/100
33230/33230 [==============================] - 1117s 34ms/sample - loss: 1.4161 - mean_absolute_error: 0.9179 - val_loss: 11.5235 - val_mean_absolute_error: 2.4781
It also takes 1000x times longer, but performs well on the test set.它也需要 1000 倍的时间,但在测试集上表现良好。 I don't understand why this happens.
我不明白为什么会这样。 Can someone shed light on this?
有人可以解释一下吗? I'm guessing I'm missing something really basic, but I can't figure out what.
我猜我错过了一些非常基本的东西,但我不知道是什么。
A very good question.一个非常好的问题。 First of all you will have to understand how the network actually work.
首先,您必须了解网络的实际工作方式。
Dense
layer is a fully conected layer so each neuron will have a connection with the previous layer's neuron. Dense
层是一个完全连接的层,因此每个神经元都将与前一层的神经元连接。 Now your networks Performance that you have mentioned that it is 1000x
time slower is nothing to do with your training data, but with your network.现在,您提到的网络性能要慢
1000x
,这与您的训练数据无关,而是与您的网络有关。 Your second network is so big that I was unable to fit it in my RAM as well as not in Google Colab.您的第二个网络太大了,我无法将它放入我的 RAM 中,也无法放入 Google Colab 中。 So for demonstration purposes I will take that your training data is is of
(500, 100)
shape.因此,出于演示目的,我将假设您的训练数据为
(500, 100)
形状。
For the First network that you posted taking the above mentioned shape your model network looks something like below:对于您发布的采用上述形状的第一个网络,您的 model 网络如下所示:
model = Sequential()
model.add(Dense(300, activation="relu", input_shape=(100,)))
model.add(Dense(300, activation="relu"))
model.add(Dense(100, activation="relu"))
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 300) 30300
_________________________________________________________________
dense_3 (Dense) (None, 300) 90300
_________________________________________________________________
dense_4 (Dense) (None, 100) 30100
_________________________________________________________________
dense_5 (Dense) (None, 1) 101
=================================================================
Total params: 150,801
Trainable params: 150,801
Non-trainable params: 0
_________________________________________________________________
Take a note of the Total params, it is 150,801
.记下 Total 参数,它是
150,801
。 Now if we take your second example.现在,如果我们举第二个例子。
model1 = Sequential()
model1.add(Dense(300, activation="relu", input_shape=(100,1)))
model1.add(Flatten())
model1.add(Dense(300, activation="relu"))
model1.add(Dense(100, activation="relu"))
model1.add(Dense(1, activation='linear'))
model1.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_14 (Dense) (None, 100, 300) 600
_________________________________________________________________
flatten_2 (Flatten) (None, 30000) 0
_________________________________________________________________
dense_15 (Dense) (None, 300) 9000300
_________________________________________________________________
dense_16 (Dense) (None, 100) 30100
_________________________________________________________________
dense_17 (Dense) (None, 1) 101
=================================================================
Total params: 9,031,101
Trainable params: 9,031,101
Non-trainable params: 0
_________________________________________________________________
Your total params increases to 9,031,101
.您的总参数增加到
9,031,101
。 You can image when you use your actual data that has length 20020
.当您使用长度为
20020
的实际数据时,您可以进行映像。 Your model increases like anything and I was even unable to fit that model in my RAM.你的 model 像任何东西一样增加,我什至无法在我的 RAM 中安装 model。
So to conclude, your second model has huge number of parameters compared to first model.总而言之,与第一个 model 相比,您的第二个 model 具有大量参数。 This is the reason for slow training and better performance may be?
这可能是训练慢而性能更好的原因吗? more parameters makes the learning better.
更多的参数使学习更好。 Can't say what makes it better without actually looking at your data.
如果不实际查看您的数据,就无法说出是什么让它变得更好。 But more parameters can contribute to better performance.
但是更多的参数可以有助于更好的性能。
Note: If you remove the Flatten
layer your network paramters will decrease, here is the example.注意:如果您删除
Flatten
层,您的网络参数将减少,这是示例。
model1 = Sequential()
model1.add(Dense(300, activation="relu", input_shape=(100,1)))
model1.add(Dense(300, activation="relu"))
model1.add(Dense(100, activation="relu"))
model1.add(Dense(1, activation='linear'))
model1.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_18 (Dense) (None, 100, 300) 600
_________________________________________________________________
dense_19 (Dense) (None, 100, 300) 90300
_________________________________________________________________
dense_20 (Dense) (None, 100, 100) 30100
_________________________________________________________________
dense_21 (Dense) (None, 100, 1) 101
=================================================================
Total params: 121,101
Trainable params: 121,101
Non-trainable params: 0
_________________________________________________________________
I hope my answer helped you understand what is hapening and what is the difference between two models.我希望我的回答能帮助您了解什么是 hapening 以及两种模型之间的区别。
UPDATE: 20/07 For your comment, I thought it is better to update the answer for more clarity.更新:20/07对于您的评论,我认为最好更新答案以更清晰。 Your question is -- how does the number of parameters relate to the shape of the network?
你的问题是——参数的数量与网络的形状有什么关系?
To be honest I do not clearly understand what you mean by this.老实说,我不太清楚你的意思。 I will still try to answer it.
我仍然会尝试回答它。 The more layers or neurons you add increases the network and the number of trainable parameters.
添加的层或神经元越多,网络和可训练参数的数量就会增加。
So your actual issue is why does the layer Flatten
increases you parameter.所以你的实际问题是为什么层
Flatten
会增加你的参数。 For that you need to understand how are parameters calculated.为此,您需要了解如何计算参数。
model.add(Dense(300, activation="relu", input_shape=(100,)))
Consider this is your first layer the number of parameters will be units *(input_size + 1)
that comes to 30300
.考虑这是您的第一层,参数的数量将是
units *(input_size + 1)
达到30300
。 Now when you add the Flatten
layer, this actually does not increase your parameter by itself, but the output of the Flatten
layer is input to the following layer.现在当你添加
Flatten
层的时候,这实际上并没有自己增加你的参数,而是将Flatten
层的output输入到下一层。 So consider the following example.所以考虑下面的例子。
_________________________________________________________________
flatten_2 (Flatten) (None, 30000) 0
_________________________________________________________________
dense_15 (Dense) (None, 300) 9000300
_________________________________________________________________
Here you can see that the output size of the Flatten
layer is 30000
.在这里可以看到
Flatten
层的 output 大小为30000
。 Now considering the above formula you can see 300 *(30000 + 1)
will result in 9000300
parameters which is a huge deal in itself.现在考虑上面的公式,您可以看到
300 *(30000 + 1)
将产生9000300
参数,这本身就是一笔巨大的交易。 More number of parameters can help to learn more features and might help to achieve better results.更多数量的参数可以帮助学习更多特征,并可能有助于获得更好的结果。 But it always depends on the data, you will have to experiment with it.
但这始终取决于数据,您将不得不对其进行试验。
I hope the above explainations might have cleared your doubts.我希望以上解释可以消除您的疑虑。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.