[英]Neural network always predicts the same class in each epoch
I'm trying to implement a mnist classifier with DNN.我正在尝试使用 DNN 实现一个 mnist 分类器。 However, the result I got is quite strange.
然而,我得到的结果很奇怪。 enter image description here
在此处输入图片说明
In this epoch, this model can only predict number '0' correctly, and incorrect prediction for all the other numbers.在这个时代,这个模型只能正确预测数字“0”,而对所有其他数字的预测都是错误的。 This model could only predict a specific number for each epoch.
该模型只能预测每个时期的特定数字。 (such predicted number is different in each epoch)
(这样的预测数在每个时期都不一样)
This is how I get the dataset.这就是我获取数据集的方式。
from sklearn.datasets import fetch_openml
from keras.utils.np_utils import to_categorical
import numpy as np
from sklearn.model_selection import train_test_split
import time
x, y = fetch_openml('mnist_784', version=1, return_X_y=True)
x = (x/255.).astype('float32')
y = to_categorical(y)
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.15, random_state=42)
For this part, this is my model.对于这部分,这是我的模型。 A two-hidden-layers DNN with activation functions of Relu and softmax, Cross entropy loss for the error function.
具有 Relu 和 softmax 激活函数的两隐藏层 DNN,误差函数的交叉熵损失。 I'm not really sure if my backpropagation is correct or not.
我不确定我的反向传播是否正确。 I think something is wrong here.
我认为这里有问题。
import numpy as np
class NN():
def __init__(self, input_size, hidden_1_size, hidden_2_size, output_size):
self.input_data = np.random.randn(1, input_size)
self.w1 = np.random.randn(input_size, hidden_1_size)
self.b1 = np.random.randn(1, hidden_1_size)
self.w2 = np.random.randn(hidden_1_size, hidden_2_size)
self.b2 = np.random.randn(1, hidden_2_size)
self.w3 = np.random.randn(hidden_2_size, output_size)
self.b3 = np.random.randn(1, output_size)
def Sigmoid(self, z):
return np.clip(1 / (1.0 + np.exp(-z)), 1e-8, 1 - (1e-7))
def Softmax(self, z):
y_logit = np.exp(z - np.max(z, 1, keepdims=True))
y = y_logit / np.sum(y_logit, 1, keepdims=True)
return y
def Relu(self, z):
return np.maximum(z, 0)
def acc_test(self, input_data):
tmp_h1 = self.Relu(input_data.dot(self.w1) + self.b1)
tmp_h2 = self.Relu(self.h1_out.dot(self.w2) + self.b2)
tmp_out = self.Softmax(self.h2_out.dot(self.w3) + self.b3)
return tmp_out
# Feed Placeholder
def forward(self, input_data):
self.input_data = input_data
self.h1_out = self.Relu(input_data.dot(self.w1) + self.b1)
self.h2_out = self.Relu(self.h1_out.dot(self.w2) + self.b2)
self.output_layer = self.Softmax(self.h2_out.dot(self.w3) + self.b3)
# Backward Propagation
def backward(self, target):
# corss_entropy loss derivative
Loss_to_z_grad = (self.output_layer - target) # correct
self.b3_grad = Loss_to_z_grad
self.w3_grad = self.h2_out.T.dot(Loss_to_z_grad) # correct
Activation_2_grad = Loss_to_z_grad.dot(self.w3.T) # correct
Activation_2_grad[Activation_2_grad<0] = 0
self.b2_grad = Activation_2_grad
self.w2_grad = self.h1_out.T.dot(Activation_2_grad)
Activation_1_grad = Activation_2_grad.dot(self.w2.T)
Activation_1_grad[Activation_1_grad<0] = 0
self.b1_grad = Activation_1_grad
self.w1_grad = self.input_data.T.dot(Activation_1_grad)
# Update Weights
def update(self, learning_rate=1e-06):
self.w1 = self.w1 - learning_rate * self.w1_grad
self.b1 = self.b1 - learning_rate * self.b1_grad
self.w2 = self.w2 - learning_rate * self.w2_grad
self.b2 = self.b2 - learning_rate * self.b2_grad
self.w3 = self.w3 - learning_rate * self.w3_grad
self.b3 = self.b3 - learning_rate * self.b3_grad
# Loss Functions
def cross_entropy(Y, Y_prediction):
return -(np.matmul(Y, np.log(Y_prediction)) + np.matmul((1-Y), np.log(1-Y_prediction)))
def print_accuracy(self):
correct = 0
loss = 0
for i in range(y_val.shape[0]):
self.acc_test(x_val[i])
index = self.output_layer
one_hot = 0
for check in range(y_val[i].shape[0]):
if y_val[i][check] == 1:
one_hot = check
break
if np.argmax(index) == one_hot:
correct += 1
# print('correct: ',check)
# else:
# print('incorrect: ', check)
print('accuracy = ', correct/y_val.shape[0])
import random
mnist_nn = NN(input_size = 784, hidden_1_size = 200, hidden_2_size = 200,output_size = 10)
for i in range(1000):
for j in range(2000):
index = random.randint(0,x_train.shape[0]-1)
mnist_nn.forward(x_train[[index]])
mnist_nn.backward(y_train[index])
mnist_nn.update()
print(i)
mnist_nn.print_accuracy()
The accuracy is terribly low since it can only predict one number.准确度非常低,因为它只能预测一个数字。 I've seen this article, Neural network always predicts the same class and I did change Relu to leaky Relu, but it doesn't really work.
我看过这篇文章, 神经网络总是预测同一个类,我确实将Relu更改为leaky Relu,但它并没有真正起作用。
I think my dataset should be ok cause I use the same dataset to train a DNN with pytorch, and it works.我认为我的数据集应该没问题,因为我使用相同的数据集来训练带有 pytorch 的 DNN,并且它有效。 Also, the initial value of weights and bias are random values.
此外,权重和偏差的初始值是随机值。
I've had a quick look over your code and if I understand it correctly, then there may be some issues:我快速浏览了您的代码,如果我理解正确,那么可能存在一些问题:
y * log(y_pred)
over the 10 output probabilities and then take the mean across the batch, so you end up with a scalar valued loss.y * log(y_pred)
,然后取整个批次的平均值,因此最终得到标量值损失。Activation_2_grad[Activation_2_grad<0] = 0
should be Activation_2_grad[self.h2_out < 0] = 0
.Activation_2_grad[Activation_2_grad<0] = 0
应该是Activation_2_grad[self.h2_out < 0] = 0
。Try using tanh to make sure the algo is working properly.尝试使用 tanh 来确保算法正常工作。 Relu can be finnicky, and only really dealt with if you have control over the dataset, as it requires a fairly even distribution of the classes.
Relu 可能很挑剔,只有在您可以控制数据集的情况下才能真正处理它,因为它需要类的相当均匀的分布。 (Leaky_relu is hard to gauge as well)
(Leaky_relu 也很难衡量)
In terms of whether or not your backprop is working properly, you might be better off using the keras sequential model, ie:就您的反向传播是否正常工作而言,您最好使用 keras 顺序模型,即:
model = tf.keras.Sequential([
input_layer,
tf.keras.layers.Dense(128, activation="tanh"),
tf.keras.layers.Dense([number of outputs], activation='softmax'),
])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.