简体   繁体   English

从头开始构建的简单神经网络不是学习

[英]Simple Neural Network built from scratch is not learning

I have implemented a neural network class that always has just a single hidden layer, using no libraries - not even numpy. 我已经实现了一个神经网络类,它总是只有一个隐藏层,不使用库 - 甚至不是numpy。 I have done everything such the way that I understood it should be, but it is not learning at all, the loss is actually continuously increasing and I cannot find where I have gone wrong, even after looking at many examples online. 我已经完成了所有这些我应该理解的方式,但它根本就没有学习,实际上不断增加的损失,即使在网上查看了很多例子,我也无法找到出错的地方。

Here is my MLP class and a demo of it attempting to learn the XOR function: 这是我的MLP类和它试图学习XOR函数的演示:

import random
from math import exp


class MLP:

    def __init__(self, numInputs, numHidden, numOutputs):
        # MLP architecture sizes
        self.numInputs = numInputs
        self.numHidden = numHidden
        self.numOutputs = numOutputs

        # MLP weights
        self.IH_weights = [[random.random() for i in range(numHidden)] for j in range(numInputs)]
        self.HO_weights = [[random.random() for i in range(numOutputs)] for j in range(numHidden)]

        # Gradients corresponding to weight matrices computed during backprop
        self.IH_gradients = [[0 for i in range(numHidden)] for j in range(numInputs)]
        self.HO_gradients = [[0 for i in range(numOutputs)] for j in range(numHidden)]

        # Input, hidden and output neuron values
        self.I = None
        self.H = [0 for i in range(numHidden)]
        self.O = [0 for i in range(numOutputs)]

        self.H_deltas = [0 for i in range(numHidden)]
        self.O_deltas = [0 for i in range(numOutputs)]

    # Sigmoid
    def activation(self, x):
        return 1 / (1 + exp(-x))

    # Derivative of Sigmoid
    def activationDerivative(self, x):
        return x * (1 - x)

    # Squared Error
    def calculateError(self, prediction, label):
        return (prediction - label) ** 2

    def forward(self, input):
        self.I = input
        for i in range(self.numHidden):
            for j in range(self.numInputs):
                self.H[i] += self.I[j] * self.IH_weights[j][i]
            self.H[i] = self.activation(self.H[i])

        for i in range(self.numOutputs):
            for j in range(self.numHidden):
                self.O[i] += self.activation(self.H[j] * self.HO_weights[j][i])
            self.O[i] = self.activation(self.O[i])

        return self.O

    def backwards(self, label):
        if label != list:
            label = [label]

        error = 0
        for i in range(self.numOutputs):
            neuronError = self.calculateError(self.O[i], label[i])
            error += neuronError
            self.O_deltas[i] = neuronError * self.activationDerivative(self.O[i])
            for j in range(self.numHidden):
                self.HO_gradients[j][i] += self.O_deltas[i] * self.H[j]

        for i in range(self.numHidden):
            neuronError = 0
            for j in range(self.numOutputs):
                neuronError += self.HO_weights[i][j] * self.O_deltas[j]
            self.H_deltas[i] = neuronError * self.activationDerivative(self.H[i])
            for j in range(self.numInputs):
                self.IH_gradients[j][i] += self.H_deltas[i] * self.I[j]

        return error

    def updateWeights(self, learningRate):
        for i in range(self.numInputs):
            for j in range(self.numHidden):
                self.IH_weights[i][j] += learningRate * self.IH_gradients[i][j]

        for i in range(self.numHidden):
            for j in range(self.numOutputs):
                self.HO_weights[i][j] += learningRate * self.HO_gradients[i][j]

        self.IH_gradients = [[0 for i in range(self.numHidden)] for j in range(self.numInputs)]
        self.HO_gradients = [[0 for i in range(self.numOutputs)] for j in range(self.numHidden)]


data = [
    [[0, 0], 0],
    [[0, 1], 1],
    [[1, 0], 1],
    [[1, 1], 0]
]

mlp = MLP(2, 5, 1)

for epoch in range(100):
    epochError = 0
    for i in range(len(data)):
        mlp.forward(data[i][0])
        epochError += mlp.backwards(data[i][1])
    print(epochError / len(data))
    mlp.updateWeights(0.001)

如果我理解你的实现正确,那么我认为你的问题在于计算向后函数中的权重更新,更新应该是错误(不是误差平方)乘以sigmoid导数,所以我会看看/重做计算。

How did you go with this? 你是怎么做到这一点的? I showed it to a friend - we both found your goal of doing the algorithm without much abstraction was edifying, although trying to find errors is difficult. 我把它展示给了一位朋友 - 我们都发现你做这个算法而没有太多抽象的目标是有启发性的,尽管试图找到错误很困难。

The improvement he found is that updateWeights needs to be a negative feedback loop, so change "+=" to "-=" in two lines giving: 他发现的改进是updateWeights需要是一个负反馈循环,所以在两行中将“+ =”改为“ - =”给出:

self.IH_weights[i][j] -= learningRate * self.IH_gradients[i][j]

and

self.HO_weights[i][j] -= learningRate * self.HO_gradients[i][j]

The other factor is increasing the learning rate. 另一个因素是提高学习率。 With these changes, the error descends to about 16% (for me, I may have made another change that I am not seeing) before it begins to climb asymptoting to 27% - maybe due to overtraining with a learning rate that is too high. 随着这些变化,错误下降到大约16%(对我来说,我可能已经做了另一个我没有看到的变化),然后开始攀升至27% - 可能是由于学习率过高而过度训练。

I made the learning rate dependent on the epoch 我的学习率取决于时代

mlp.updateWeights(0.1/(0.01 * (epoch+1)))

and its decreases steadily and stabilizes at 0.161490... 并且它稳步下降并稳定在0.161490 ......

But if you get the prediction from 'forward', its always predicting 0.66 - the inputs have been wiped away. 但是,如果你从“前进”得到预测,它总是预测0.66 - 输入已被消除。 So... that's bad. 所以......那很糟糕。

 - Input Data: [0, 0] | Prediction: [0.6610834017294481] |Truth: 0
 - Input Data: [0, 1] | Prediction: [0.6616502691118376] |Truth: 1
 - Input Data: [1, 0] | Prediction: [0.6601936411430607] |Truth: 1
 - Input Data: [1, 1] | Prediction: [0.6596122207209283] |Truth: 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM