简体   繁体   English

神经网络在玩具数据集上失败

[英]Neural net fails on toy dataset

I have created the following toy dataset: 我创建了以下玩具数据集: 数据集图

I am trying to predict the class with a neural net in keras: 我试图用keras中的神经网络来预测这个类:

model = Sequential()
model.add(Dense(units=2, activation='sigmoid', input_shape= (nr_feats,)))
model.add(Dense(units=nr_classes, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

With nr_feats and nr_classes set to 2. The neural net can only predict with 50 percent accuracy returning either all 1's or all 2's. nr_featsnr_classes设置为2.神经网络只能以50%的准确度预测返回全部1或全部2。 Using Logistic Regression results in 100 percent accuracy. 使用Logistic回归可实现100%的准确性。

I can not find what is going wrong here. 我找不到这里出了什么问题。

I have uploaded a notebook to github if you quickly want to try something. 如果您想快速尝试一下,我已经将笔记本上传到了github。

EDIT 1 编辑1

I drastically increased the number of epochs and accuracy finally starts to improve from 0.5 at epoch 72 and converges to 1.0 at epoch 98. This still seems extremely slow for such a simple dataset. 我大大增加了时代的数量,精确度最终从第72纪元的0.5开始提高,并在时代98收敛到1.0。对于这样一个简单的数据集,这似乎仍然非常缓慢。

I am aware it is better to use a single output neuron with sigmoid activation but it's more that I want to understand why it does not work with two output neurons and softmax activation. 我知道最好使用单个输出神经元进行sigmoid激活,但更多的是我想了解为什么它不适用于两个输出神经元和softmax激活。

I pre-process my dataframe as follows: 我按如下方式预处理我的数据帧:

from sklearn.preprocessing import LabelEncoder

x_train = df_train.iloc[:,0:-1].values
y_train = df_train.iloc[:, -1]

nr_feats = x_train.shape[1]
nr_classes = y_train.nunique()

label_enc = LabelEncoder()
label_enc.fit(y_train)

y_train = keras.utils.to_categorical(label_enc.transform(y_train), nr_classes)

Training and evaluation: 培训和评估:

model.fit(x_train, y_train, epochs=500, batch_size=32, verbose=True)
accuracy_score(model.predict_classes(x_train),  df_train.iloc[:, -1].values)

EDIT 2 编辑2

After changing the output layer to a single neuron with sigmoid activation and using binary_crossentropy loss as modesitt suggested, accuracy still remains at 0.5 for 200 epochs and converges to 1.0 100 epochs later. 输出层乙状结肠活化改变为单个神经元,并使用后binary_crossentropy损失modesitt建议的,精度仍保持在0.5 200个历元和收敛于1.0 100历元后面。

The issue is that your labels are 1 and 2 instead of 0 and 1. Keras will not raise an error when it sees 2 , but it is not capable of predicting 2 . 问题是你的标签是12而不是0和1.当Keras看到2时它不会引发错误,但是它无法预测2

Subtract 1 from all your y values. 从所有y值中减去1。 As a side note, it is common in deep learning to use 1 neuron with sigmoid for binary classification (0 or 1) vs 2 classes with softmax . 作为旁注,在深度学习中常见的是使用1个带有sigmoid neuron进行二元分类(0或1)而使用softmax 2个类。 Finally, use binary_crossentropy for the loss for binary classification problems. 最后,使用binary_crossentropy来解决二进制分类问题。

Note: Read the "Update" section at the end of my answer if you want the true reason. 注意:如果您想要真正的原因, 阅读我的答案末尾的“更新”部分。 In this scenario, the other two reasons I have mentioned are only valid when the learning rate is set to a low value (less than 1e-3 ). 在这种情况下,我提到的其他两个原因仅在学习率设置为较低值(小于1e-3 )时才有效。


I put together some code. 我把一些代码放在一起。 It is very similar to yours but I just cleaned it a little bit and made it simpler for myself. 它与你的非常相似,但我只是稍微清理一下,让自己更简单。 As you can see, I use a dense layer with one unit with a sigmoid activation function for the last layer and just change the optimizer from adam to rmsprop (it is not important that much, you can use adam if you like): 正如您所看到的,我使用了一个密集层,其中一个单元具有sigmoid激活函数,用于最后一层,只需将优化器从adam更改为rmsprop (这一点并不重要,如果您愿意,可以使用adam ):

import numpy as np
import random

# generate random data with two features
n_samples = 200
n_feats = 2

cls0 = np.random.uniform(low=0.2, high=0.4, size=(n_samples,n_feats))
cls1 = np.random.uniform(low=0.5, high=0.7, size=(n_samples,n_feats))
x_train = np.concatenate((cls0, cls1))
y_train = np.concatenate((np.zeros((n_samples,)), np.ones((n_samples,))))

# shuffle data because all negatives (i.e. class "0") are first
# and then all positives (i.e. class "1")
indices = np.arange(x_train.shape[0])
np.random.shuffle(indices)
x_train = x_train[indices]
y_train = y_train[indices]

from keras.models import Sequential
from keras.layers import Dense


model = Sequential()
model.add(Dense(2, activation='sigmoid', input_shape=(n_feats,)))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

model.summary()

model.fit(x_train, y_train, epochs=5, batch_size=32, verbose=True)

Here is the output: 这是输出:

Layer (type)                 Output Shape              Param #   
=================================================================
dense_25 (Dense)             (None, 2)                 6         
_________________________________________________________________
dense_26 (Dense)             (None, 1)                 3         
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
400/400 [==============================] - 0s 966us/step - loss: 0.7013 - acc: 0.5000
Epoch 2/5
400/400 [==============================] - 0s 143us/step - loss: 0.6998 - acc: 0.5000
Epoch 3/5
400/400 [==============================] - 0s 137us/step - loss: 0.6986 - acc: 0.5000
Epoch 4/5
400/400 [==============================] - 0s 149us/step - loss: 0.6975 - acc: 0.5000
Epoch 5/5
400/400 [==============================] - 0s 132us/step - loss: 0.6966 - acc: 0.5000

As you can see the accuracy never increases from 50%. 正如您所看到的,准确度从未从50%增加。 What if you increase the number of epochs to say 50: 如果您将时期数增加到50,该怎么办?

Layer (type)                 Output Shape              Param #   
=================================================================
dense_35 (Dense)             (None, 2)                 6         
_________________________________________________________________
dense_36 (Dense)             (None, 1)                 3         
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
_________________________________________________________________
Epoch 1/50
400/400 [==============================] - 0s 1ms/step - loss: 0.6925 - acc: 0.5000
Epoch 2/50
400/400 [==============================] - 0s 136us/step - loss: 0.6902 - acc: 0.5000
Epoch 3/50
400/400 [==============================] - 0s 133us/step - loss: 0.6884 - acc: 0.5000
Epoch 4/50
400/400 [==============================] - 0s 160us/step - loss: 0.6866 - acc: 0.5000
Epoch 5/50
400/400 [==============================] - 0s 140us/step - loss: 0.6848 - acc: 0.5000
Epoch 6/50
400/400 [==============================] - 0s 168us/step - loss: 0.6832 - acc: 0.5000
Epoch 7/50
400/400 [==============================] - 0s 154us/step - loss: 0.6817 - acc: 0.5000
Epoch 8/50
400/400 [==============================] - 0s 146us/step - loss: 0.6802 - acc: 0.5000
Epoch 9/50
400/400 [==============================] - 0s 161us/step - loss: 0.6789 - acc: 0.5000
Epoch 10/50
400/400 [==============================] - 0s 140us/step - loss: 0.6778 - acc: 0.5000
Epoch 11/50
400/400 [==============================] - 0s 177us/step - loss: 0.6766 - acc: 0.5000
Epoch 12/50
400/400 [==============================] - 0s 180us/step - loss: 0.6755 - acc: 0.5000
Epoch 13/50
400/400 [==============================] - 0s 165us/step - loss: 0.6746 - acc: 0.5000
Epoch 14/50
400/400 [==============================] - 0s 128us/step - loss: 0.6736 - acc: 0.5000
Epoch 15/50
400/400 [==============================] - 0s 125us/step - loss: 0.6728 - acc: 0.5000
Epoch 16/50
400/400 [==============================] - 0s 165us/step - loss: 0.6718 - acc: 0.5000
Epoch 17/50
400/400 [==============================] - 0s 161us/step - loss: 0.6710 - acc: 0.5000
Epoch 18/50
400/400 [==============================] - 0s 170us/step - loss: 0.6702 - acc: 0.5000
Epoch 19/50
400/400 [==============================] - 0s 122us/step - loss: 0.6694 - acc: 0.5000
Epoch 20/50
400/400 [==============================] - 0s 110us/step - loss: 0.6686 - acc: 0.5000
Epoch 21/50
400/400 [==============================] - 0s 142us/step - loss: 0.6676 - acc: 0.5000
Epoch 22/50
400/400 [==============================] - 0s 142us/step - loss: 0.6667 - acc: 0.5000
Epoch 23/50
400/400 [==============================] - 0s 149us/step - loss: 0.6659 - acc: 0.5000
Epoch 24/50
400/400 [==============================] - 0s 125us/step - loss: 0.6651 - acc: 0.5000
Epoch 25/50
400/400 [==============================] - 0s 134us/step - loss: 0.6643 - acc: 0.5000
Epoch 26/50
400/400 [==============================] - 0s 143us/step - loss: 0.6634 - acc: 0.5000
Epoch 27/50
400/400 [==============================] - 0s 137us/step - loss: 0.6625 - acc: 0.5000
Epoch 28/50
400/400 [==============================] - 0s 131us/step - loss: 0.6616 - acc: 0.5025
Epoch 29/50
400/400 [==============================] - 0s 119us/step - loss: 0.6608 - acc: 0.5100
Epoch 30/50
400/400 [==============================] - 0s 143us/step - loss: 0.6601 - acc: 0.5025
Epoch 31/50
400/400 [==============================] - 0s 148us/step - loss: 0.6593 - acc: 0.5350
Epoch 32/50
400/400 [==============================] - 0s 161us/step - loss: 0.6584 - acc: 0.5325
Epoch 33/50
400/400 [==============================] - 0s 152us/step - loss: 0.6576 - acc: 0.5700
Epoch 34/50
400/400 [==============================] - 0s 128us/step - loss: 0.6568 - acc: 0.5850
Epoch 35/50
400/400 [==============================] - 0s 155us/step - loss: 0.6560 - acc: 0.5975
Epoch 36/50
400/400 [==============================] - 0s 136us/step - loss: 0.6552 - acc: 0.6425
Epoch 37/50
400/400 [==============================] - 0s 140us/step - loss: 0.6544 - acc: 0.6150
Epoch 38/50
400/400 [==============================] - 0s 120us/step - loss: 0.6538 - acc: 0.6375
Epoch 39/50
400/400 [==============================] - 0s 140us/step - loss: 0.6531 - acc: 0.6725
Epoch 40/50
400/400 [==============================] - 0s 135us/step - loss: 0.6523 - acc: 0.6750
Epoch 41/50
400/400 [==============================] - 0s 136us/step - loss: 0.6515 - acc: 0.7300
Epoch 42/50
400/400 [==============================] - 0s 126us/step - loss: 0.6505 - acc: 0.7450
Epoch 43/50
400/400 [==============================] - 0s 141us/step - loss: 0.6496 - acc: 0.7425
Epoch 44/50
400/400 [==============================] - 0s 162us/step - loss: 0.6489 - acc: 0.7675
Epoch 45/50
400/400 [==============================] - 0s 161us/step - loss: 0.6480 - acc: 0.7775
Epoch 46/50
400/400 [==============================] - 0s 126us/step - loss: 0.6473 - acc: 0.7575
Epoch 47/50
400/400 [==============================] - 0s 124us/step - loss: 0.6464 - acc: 0.7625
Epoch 48/50
400/400 [==============================] - 0s 130us/step - loss: 0.6455 - acc: 0.7950
Epoch 49/50
400/400 [==============================] - 0s 191us/step - loss: 0.6445 - acc: 0.8100
Epoch 50/50
400/400 [==============================] - 0s 163us/step - loss: 0.6435 - acc: 0.8625

The accuracy starts to increase (Note that if you train this model multiple times, each time it may take different number of epochs to reach an acceptable accuracy, anything from 10 to 100 epochs). 准确度开始增加(请注意,如果您多次训练此模型,每次可能需要不同数量的时期才能达到可接受的准确度,从10到100个时期)。

Also, in my experiments I noticed that increasing the number of units in the first dense layer, for example to 5 or 10 units, causes the model to be trained faster (ie quickly converge). 此外,在我的实验中,我注意到增加第一密集层中的单元数量,例如增加到5或10个单元,导致模型被更快地训练(即快速收敛)。

Why so many epochs needed? 为什么需要这么多的时代?

I think it is because of these two reasons (combined): 我认为这是因为这两个原因(合并):

1) Despite the fact that the two classes are easily separable, your data is made up of random samples, and 1)尽管这两个类很容易分离,但是你的数据是由随机样本组成的

2) The number of data points compared to the size of neural net (ie number of trainable parameters, which is 9 in example code above) is relatively large. 2) 神经网络的大小相比的数据点的数量(即,可训练参数的数量,在上面的示例代码中为9 )相对较大。

Therefore, it takes more epochs for the model to learn the weights. 因此,模型需要更多的时代来学习权重。 It is as though the model is very restricted and needs more and more experience to correctly find the appropriate weights. 就好像模型非常有限,需要越来越多的经验来正确找到合适的权重。 As an evidence, just try to increase the number of units in the first dense layer. 作为证据,只是尝试增加第一个密集层中的单位数。 You are almost guaranteed to reach an accuracy of +90% with less than 10 epochs each time you attempt to train this model. 每次尝试训练此模型时,几乎可以保证精度达到+ 90%且不到10个时期。 Here you increase the capacity and therefore the model converges (ie trains) much faster (it should be noted that it starts to overfit if the capacity is too high or you train the model for too many epochs. You should have a validation scheme to monitor this issue). 在这里你增加了容量,因此模型收敛(即列车)的速度要快得多(应该注意的是,如果容量太高或者你为太多时期训练模型,它会开始过度拟合。你应该有一个验证方案来监控这个问题)。

Side note: 边注:

Don't set the high argument to a number less than the low argument in numpy.random.uniform since, according to the documentation , the results will be "officially undefined" in this case. 不要将high参数设置为小于numpy.random.uniformlow参数的numpy.random.uniform因为根据文档 ,在这种情况下结果将是“正式未定义”。

Update: 更新:

One more important thing here (maybe the most important thing in this scenario) is the learning rate of the optimizer. 这里更重要的一点(也许是这个场景中最重要的事情)是优化器的学习率。 If the learning rate is too low, the model converges slowly. 如果学习率太低,则模型收敛缓慢。 Try increasing the learning rate, and you can see you reach an accuracy of 100% with less than 5 epochs: 尝试提高学习率,您可以看到在不到5个时期内达到100%的准确度:

from keras import optimizers

model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-1),
              metrics=['accuracy'])

# or you may use adam
model.compile(loss='binary_crossentropy',
              optimizer=optimizers.Adam(lr=1e-1),
              metrics=['accuracy'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM