简体   繁体   中英

Neural networks very bad accuracy when using more than one hidden layer

I have created the following neural network:

def init_weights(m, n=1):
    """
    initialize a matrix/vector of weights with xavier initialization
    :param m: out dim
    :param n: in dim
    :return: matrix/vector of random weights
    """
    limit = (6 / (n * m)) ** 0.5
    weights = np.random.uniform(-limit, limit, size=(m, n))
    if n == 1:
        weights = weights.reshape((-1,))
    return weights


def softmax(v):
    exp = np.exp(v)
    return exp / np.tile(exp.sum(1), (v.shape[1], 1)).T


def relu(x):
    return np.maximum(x, 0)


def sign(x):
    return (x > 0).astype(int)


class Model:
    """
    A class for neural network model
    """

    def __init__(self, sizes, lr):
        self.lr = lr

        self.weights = []
        self.biases = []
        self.memory = []
        for i in range(len(sizes) - 1):
            self.weights.append(init_weights(sizes[i + 1], sizes[i]))
            self.biases.append(init_weights(sizes[i + 1]))

    def forward(self, X):
        self.memory = [X]
        X = np.dot(self.weights[0], X.T).T + self.biases[0]
        for W, b in zip(self.weights[1:], self.biases[1:]):
            X = relu(X)
            self.memory.append(X)
            X = np.dot(W, X.T).T + b
        return softmax(X)

    def backward(self, y, y_pred):
        #  calculate the errors for each layer
        y = np.eye(y_pred.shape[1])[y]
        errors = [y_pred - y]
        for i in range(len(self.weights) - 1, 0, -1):
            new_err = sign(self.memory[i]) * \
                      np.dot(errors[0], self.weights[i])
            errors.insert(0, new_err)
            
        # update weights
        for i in range(len(self.weights)):
            self.weights[i] -= self.lr *\
                np.dot(self.memory[i].T, errors[i]).T
            self.biases[i] -= self.lr * errors[i].sum(0)

The data has 10 classes. When using a single hidden layer the accuracy is almost 40%. when using 2 or 3 hidden layers, the accuracy is around 9-10% from the first epoch and remains that way. The accuracy on the train set is also in that range. Is there a problem with my implementation that could cause such a thing?

I'll try to explain in simple words. You're using an unbounded linear error function. When you're increasing the hidden layers it don't do any good as any combination of linear functions is still linear and now you've more weights to optimize with the same amount of data. To worsen the problem you've relu which is prone to vanishing gradient phenomenon. Try to replace the linear error function with a non-linear one like cross entropy or negative log loss and replace relu with leaky relu. As you haven't shared what data you're using I won't comment on whether you should be using multiple hidden layers. One thing, accuracy enhancement is not guaranteed with more hidden layers.

You asked about the accuracy improvement of a machine learning model, which is a very broad and ambiguous problem in the era of ML, because it varies between various model types and data types

In your case the model is neural network that has several factors on which accuracy is dependent. You are trying to optimize the accuracy on the basis of activation functions, weights or number of hidden layers which is not the correct way. To increase the accuracy you have to consider other factors too eg your basic checklist can be following

  • Increase Hidden Layers
  • Change Activation Functions
  • Experiment with initial weight initialization
  • Normalize Training Data
  • Scale Training Data
  • Check for Class Biasness

Now you are trying to achieve state of the art accuracy on the basis of very few factors, I don't know about your dataset as you haven't shown the pre processing code, but I recommend that you double check the dataset may be by correctly normalizing the dataset you can increase accuracy, also check if your dataset can be scaled and the most important thing if one of the class sample in your dataset is overloaded or too big in count as compared to other samples then it will also lead to the poor accuracy matrix.

For more details check this it contains the mathematical proof and explanation how these things affect your ML model accuracy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM