神经网络总是在每个 epoch 中预测相同的类别

Question

I'm trying to implement a mnist classifier with DNN.我正在尝试使用 DNN 实现一个 mnist 分类器。 However, the result I got is quite strange.然而，我得到的结果很奇怪。 enter image description here在此处输入图片说明

In this epoch, this model can only predict number '0' correctly, and incorrect prediction for all the other numbers.在这个时代，这个模型只能正确预测数字“0”，而对所有其他数字的预测都是错误的。 This model could only predict a specific number for each epoch.该模型只能预测每个时期的特定数字。 (such predicted number is different in each epoch) （这样的预测数在每个时期都不一样）

This is how I get the dataset.这就是我获取数据集的方式。

from sklearn.datasets import fetch_openml
from keras.utils.np_utils import to_categorical
import numpy as np
from sklearn.model_selection import train_test_split
import time

x, y = fetch_openml('mnist_784', version=1, return_X_y=True)
x = (x/255.).astype('float32')
y = to_categorical(y)

x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.15, random_state=42)

For this part, this is my model.对于这部分，这是我的模型。 A two-hidden-layers DNN with activation functions of Relu and softmax, Cross entropy loss for the error function.具有 Relu 和 softmax 激活函数的两隐藏层 DNN，误差函数的交叉熵损失。 I'm not really sure if my backpropagation is correct or not.我不确定我的反向传播是否正确。 I think something is wrong here.我认为这里有问题。

import numpy as np


class NN():
    def __init__(self, input_size, hidden_1_size, hidden_2_size, output_size):
        self.input_data = np.random.randn(1, input_size)
        self.w1 = np.random.randn(input_size, hidden_1_size)
        self.b1 = np.random.randn(1, hidden_1_size)
        
        self.w2 = np.random.randn(hidden_1_size, hidden_2_size)
        self.b2 = np.random.randn(1, hidden_2_size) 

        self.w3 = np.random.randn(hidden_2_size, output_size)
        self.b3 = np.random.randn(1, output_size)


    def Sigmoid(self, z):
        return np.clip(1 / (1.0 + np.exp(-z)), 1e-8, 1 - (1e-7))

    def Softmax(self, z):
        y_logit = np.exp(z - np.max(z, 1, keepdims=True))
        y = y_logit / np.sum(y_logit, 1, keepdims=True)
        return y

    def Relu(self, z):
        return np.maximum(z, 0)

    def acc_test(self, input_data):
        tmp_h1 = self.Relu(input_data.dot(self.w1) + self.b1)
        tmp_h2 = self.Relu(self.h1_out.dot(self.w2) + self.b2)
        tmp_out = self.Softmax(self.h2_out.dot(self.w3) + self.b3)
        return tmp_out

    # Feed Placeholder

    def forward(self, input_data):

        self.input_data = input_data
        self.h1_out = self.Relu(input_data.dot(self.w1) + self.b1)
        self.h2_out = self.Relu(self.h1_out.dot(self.w2) + self.b2)
        self.output_layer = self.Softmax(self.h2_out.dot(self.w3) + self.b3)

    # Backward Propagation

    def backward(self, target):

        # corss_entropy loss derivative
        Loss_to_z_grad = (self.output_layer - target) # correct

        self.b3_grad = Loss_to_z_grad
        self.w3_grad = self.h2_out.T.dot(Loss_to_z_grad) # correct



        Activation_2＿grad = Loss_to_z_grad.dot(self.w3.T) # correct
        Activation_2_grad[Activation_2_grad<0] = 0

        self.b2_grad = Activation_2＿grad
        self.w2_grad = self.h1_out.T.dot(Activation_2＿grad)

        
        Activation_1＿grad = Activation_2＿grad.dot(self.w2.T)
        Activation_1_grad[Activation_1_grad<0] = 0     

        self.b1_grad = Activation_1＿grad
        self.w1_grad = self.input_data.T.dot(Activation_1＿grad)


    # Update Weights
    def update(self, learning_rate=1e-06):
        self.w1 = self.w1 - learning_rate * self.w1_grad
        self.b1 = self.b1 - learning_rate * self.b1_grad

        self.w2 = self.w2 - learning_rate * self.w2_grad
        self.b2 = self.b2 - learning_rate * self.b2_grad

        self.w3 = self.w3 - learning_rate * self.w3_grad
        self.b3 = self.b3 - learning_rate * self.b3_grad

    # Loss Functions
    def cross_entropy(Y, Y_prediction):
        return -(np.matmul(Y, np.log(Y_prediction)) + np.matmul((1-Y), np.log(1-Y_prediction)))

    def print_accuracy(self):
        correct = 0
        loss = 0
        for i in range(y_val.shape[0]):
            self.acc_test(x_val[i])
            index = self.output_layer
            one_hot = 0
            for check in range(y_val[i].shape[0]):
                if y_val[i][check] == 1:
                    one_hot = check
                    break
            if np.argmax(index) == one_hot:
                correct += 1
                # print('correct: ',check)
            # else:
                # print('incorrect: ', check)
        print('accuracy = ', correct/y_val.shape[0])

import random
 mnist_nn = NN(input_size = 784, hidden_1_size = 200, hidden_2_size = 200,output_size = 10)


 
for i in range(1000):
    for j in range(2000):
        index = random.randint(0,x_train.shape[0]-1)
        mnist_nn.forward(x_train[[index]])
        mnist_nn.backward(y_train[index])
        mnist_nn.update()
    print(i)
    mnist_nn.print_accuracy()

The accuracy is terribly low since it can only predict one number.准确度非常低，因为它只能预测一个数字。 I've seen this article, Neural network always predicts the same class and I did change Relu to leaky Relu, but it doesn't really work.我看过这篇文章，神经网络总是预测同一个类，我确实将Relu更改为leaky Relu，但它并没有真正起作用。

I think my dataset should be ok cause I use the same dataset to train a DNN with pytorch, and it works.我认为我的数据集应该没问题，因为我使用相同的数据集来训练带有 pytorch 的 DNN，并且它有效。 Also, the initial value of weights and bias are random values.此外，权重和偏差的初始值是随机值。

Answer 1

I've had a quick look over your code and if I understand it correctly, then there may be some issues:我快速浏览了您的代码，如果我理解正确，那么可能存在一些问题：

It seems like you want it to do multi-class classification with 10 classes, however I believe your cross entropy function looks like binary cross entropy, instead of general cross entropy .似乎您希望它对 10 个类进行多类分类，但是我相信您的交叉熵函数看起来像二元交叉熵，而不是一般的交叉熵。 Also, you're using matrix multiplication, whereas I think you want to sum y * log(y_pred) over the 10 output probabilities and then take the mean across the batch, so you end up with a scalar valued loss.此外，您使用的是矩阵乘法，而我认为您想对 10 个输出概率求和y * log(y_pred) ，然后取整个批次的平均值，因此最终得到标量值损失。
When doing the ReLU gradient, you should clip where the actual activation is negative, not where the gradient is negative, I think.我认为，在进行 ReLU 梯度时，您应该剪辑实际激活为负的位置，而不是梯度为负的位置。 So Activation_2_grad[Activation_2_grad<0] = 0 should be Activation_2_grad[self.h2_out < 0] = 0 .所以Activation_2_grad[Activation_2_grad<0] = 0应该是Activation_2_grad[self.h2_out < 0] = 0 。
The rest of the backprop looks okay.其余的反向传播看起来不错。

Answer 2

Try using tanh to make sure the algo is working properly.尝试使用 tanh 来确保算法正常工作。 Relu can be finnicky, and only really dealt with if you have control over the dataset, as it requires a fairly even distribution of the classes. Relu 可能很挑剔，只有在您可以控制数据集的情况下才能真正处理它，因为它需要类的相当均匀的分布。 (Leaky_relu is hard to gauge as well) （Leaky_relu 也很难衡量）

In terms of whether or not your backprop is working properly, you might be better off using the keras sequential model, ie:就您的反向传播是否正常工作而言，您最好使用 keras 顺序模型，即：

model = tf.keras.Sequential([
    input_layer,
    tf.keras.layers.Dense(128, activation="tanh"),
    tf.keras.layers.Dense([number of outputs], activation='softmax'),
])

神经网络总是在每个 epoch 中预测相同的类别

问题描述

2 个解决方案

解决方案1
1 2020-08-28 07:53:54

解决方案2
0 2020-08-28 07:45:10

神经网络总是在每个 epoch 中预测相同的类别

问题描述

2 个解决方案

解决方案1 1 2020-08-28 07:53:54

解决方案2 0 2020-08-28 07:45:10

解决方案1
1 2020-08-28 07:53:54

解决方案2
0 2020-08-28 07:45:10