Keras：具有简单数据的简单神经网络无法正常工作

Question

I try to create a very simple neural.network: one hidden layer, with 2 neurons.我尝试创建一个非常简单的神经网络：一个隐藏层，有 2 个神经元。 For some very simple data: only one feature.对于一些非常简单的数据：只有一个特征。

import numpy as np
X=np.concatenate([np.linspace(0,10,100),np.linspace(11,20,100),np.linspace(21,30,100)])
y=np.concatenate([np.repeat(0,100),np.repeat(1,100),np.repeat(0,100)])

Here is the model这是model

from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(2, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit(X, y, epochs=200)

In theory, this model should work.理论上，这个 model 应该可以工作。 But even after 1000 epochs, the accuracy is still 0.667.但即使在 1000 个 epoch 之后，准确率仍然是 0.667。

Epoch 999/1000
10/10 [==============================] - 0s 1ms/step - loss: 0.5567 - accuracy: 0.6667
Epoch 1000/1000
10/10 [==============================] - 0s 2ms/step - loss: 0.5566 - accuracy: 0.6667

I think that I did something wrong.我想我做错了什么。 Could you suggest some modification?你能建议一些修改吗？

It seems that there are a lot of local minimums and the initialization can change the final model. It is the case when testing with the package .net in R. I had to test lots of seeds, I found this model (among others).似乎有很多局部最小值，初始化可以更改最终的 model。在 R 中使用 package .net进行测试时就是这种情况。我不得不测试很多种子，我发现了这个 model（以及其他）。

And this is the structure that I wanted to create with keras: one hidden layer with 2 neurons.这就是我想用 keras 创建的结构：一个隐藏层有 2 个神经元。 The activation function is sigmoid.激活 function 是 sigmoid。

So I am wondering if keras has the same problem with initialization.所以我想知道keras是否有同样的初始化问题。 With this package .net in R, I thought that it is not a "perfect" package. And I thought keras would be more performant.对于 R 中的这个 package .net ，我认为它不是“完美”的 package。我认为 keras 的性能会更高。 If the initialization is important, does keras test different initialization?如果初始化很重要，keras 是否测试不同的初始化？ If not why?如果不是为什么？ Maybe because in general, with more data (and more features), it works better (without testing many initializations)?也许是因为通常情况下，数据越多（功能越多）效果越好（无需测试许多初始化）？

For example, with kmeans, it seems that different initializations are tested.例如，对于 kmeans，似乎测试了不同的初始化。

Answer 1

This question shows the importance of input data normalization for the neural.networks.这个问题显示了输入数据规范化对神经网络的重要性。 Without normalization, training neural.networks is hard sometimes because the optimization might get stuck at some local minimums.如果没有归一化，训练 neural.networks 有时会很困难，因为优化可能会卡在某些局部最小值处。

I want to start with the visualization of the dataset.我想从数据集的可视化开始。 The dataset is 1D and after it has been normalized with standard normalization it looks like the following.数据集是一维的，在使用标准规范化对其进行规范化后，它看起来如下所示。

X_original = np.concatenate([np.linspace(0, 10, 100), np.linspace(
11, 20, 100), np.linspace(21, 30, 100)])
X = (X_original - X_original.mean())/X_original.std()

y = np.concatenate(
        [np.repeat(0, 100), np.repeat(1, 100), np.repeat(0, 100)])
plt.figure()
plt.scatter(X, np.zeros(X.shape[0]), c=y)
plt.show()

The best way to separate these data points into the respective classes is by drawing two lines on the input space.将这些数据点分成相应类别的最佳方法是在输入空间上画两条线。 Since the input space is 1D, the classification boundaries are just 1D points.由于输入空间是一维的，分类边界只是一维的点。

This implies that a single layer.networks such as logistic regression cannot classify this dataset.这意味着单层网络（如逻辑回归）无法对该数据集进行分类。 But a neural.network with two layers followed by nonlinear activation should be able to classify the dataset.但是具有两层后接非线性激活的神经网络应该能够对数据集进行分类。

Now with the normalization, and the following training script the model can easily learn to classify the points.现在有了归一化，下面的训练脚本 model 可以很容易地学会对点进行分类。

model = Sequential()
model.add(Dense(2, activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
                  optimizer=keras.optimizers.Adam(1e-1), metrics=['accuracy'])
model.fit(X, y, epochs=20)

Train on 300 samples
Epoch 1/20
300/300 [==============================] - 1s 2ms/sample - loss: 0.6455 - accuracy: 0.6467
Epoch 2/20
300/300 [==============================] - 0s 79us/sample - loss: 0.6493 - accuracy: 0.6667
Epoch 3/20
300/300 [==============================] - 0s 85us/sample - loss: 0.6397 - accuracy: 0.6667
Epoch 4/20
300/300 [==============================] - 0s 100us/sample - loss: 0.6362 - accuracy: 0.6667
Epoch 5/20
300/300 [==============================] - 0s 115us/sample - loss: 0.6342 - accuracy: 0.6667
Epoch 6/20
300/300 [==============================] - 0s 96us/sample - loss: 0.6317 - accuracy: 0.6667
Epoch 7/20
300/300 [==============================] - 0s 93us/sample - loss: 0.6110 - accuracy: 0.6667
Epoch 8/20
300/300 [==============================] - 0s 110us/sample - loss: 0.5746 - accuracy: 0.6667
Epoch 9/20
300/300 [==============================] - 0s 142us/sample - loss: 0.5103 - accuracy: 0.6900
Epoch 10/20
300/300 [==============================] - 0s 124us/sample - loss: 0.4207 - accuracy: 0.9367
Epoch 11/20
300/300 [==============================] - 0s 124us/sample - loss: 0.3283 - accuracy: 0.9833
Epoch 12/20
300/300 [==============================] - 0s 124us/sample - loss: 0.2553 - accuracy: 0.9800
Epoch 13/20
300/300 [==============================] - 0s 138us/sample - loss: 0.2030 - accuracy: 1.0000
Epoch 14/20
300/300 [==============================] - 0s 124us/sample - loss: 0.1624 - accuracy: 1.0000
Epoch 15/20
300/300 [==============================] - 0s 150us/sample - loss: 0.1375 - accuracy: 1.0000
Epoch 16/20
300/300 [==============================] - 0s 122us/sample - loss: 0.1161 - accuracy: 1.0000
Epoch 17/20
300/300 [==============================] - 0s 115us/sample - loss: 0.1025 - accuracy: 1.0000
Epoch 18/20
300/300 [==============================] - 0s 126us/sample - loss: 0.0893 - accuracy: 1.0000
Epoch 19/20
300/300 [==============================] - 0s 121us/sample - loss: 0.0804 - accuracy: 1.0000
Epoch 20/20
300/300 [==============================] - 0s 132us/sample - loss: 0.0720 - accuracy: 1.0000

Since the model is very simple, the choice of learning rate and optimizer affects the speed of the learning.由于 model 非常简单，学习率和优化器的选择会影响学习速度。 With SGD optimizer and learning rate 1e-1, the model might take a longer time to train than Adam optimizer with the same learning rate.使用 SGD 优化器和学习率 1e-1，model 可能比具有相同学习率的 Adam 优化器需要更长的时间来训练。

Keras：具有简单数据的简单神经网络无法正常工作

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-08-26 09:19:19

Keras：具有简单数据的简单神经网络无法正常工作

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-08-26 09:19:19

解决方案1
3 已采纳 2020-08-26 09:19:19