改善简单的1层神经网络

Question

I've created my own very simple 1 layer neural network, specialised in binary classification problems. 我创建了自己的非常简单的1层神经网络，专门研究二进制分类问题。 Where the input data-points are multiplied by the weights and a bias is added. 输入数据点乘以权重并加上一个偏差。 The whole thing is summed (weighted-sum) and fed through an activation function (such as relu or sigmoid ). 整个东西相加（加权和）并通过激活函数（例如relu或sigmoid ）进行relu 。 That would be the prediction output. 那将是预测输出。 There are no other layers (ie hidden layers) involved. 不涉及其他层（即隐藏层）。

Just for my own understanding of the mathematical side, I didn't want to use an existing library/package (eg Keras, PyTorch, Scikit-learn ..etc), but simply wanted to create a neural network using plain python code. 仅出于我自己对数学方面的理解，我不想使用现有的库/程序包（例如Keras，PyTorch，Scikit-learn ..etc），而只是想使用简单的python代码创建神经网络。 The model is created inside a method ( simple_1_layer_classification_NN ) that takes the necessary parameters to make a prediction. 该模型是在方法（ simple_1_layer_classification_NN ）中创建的，该方法采用必要的参数进行预测。 However, I encountered some problems, and as such listed the questions below along with my code. 但是，我遇到了一些问题，因此下面列出了一些问题以及我的代码。

Ps I really apologise for including such a large portion of code, but I didn't know how else to ask the questions without referencing the relevant code. 附言：我真的很抱歉包含这么多代码，但是我不知道如何在不参考相关代码的情况下提出问题。

The questions: 问题：

1 - When I passed some training dataset to train the network, I found that the final average accuracy completely differed with different number of Epochs with absolutely no clear pattern to some sort of optimal number of Epochs. 1-当我通过一些训练数据集来训练网络时，我发现最终平均准确度会随着不同数量的时期而完全不同，对于某种最佳数量的时期而言绝对没有明确的模式。 I kept the other parameters the same: learning rate = 0.5 , activation = sigmoid (since it's 1 layer - being both the input and output layer. No hidden layers involved. I've read sigmoid is suited for output layer more than relu ), cost function = squared error . 我将其他参数保持不变： learning rate = 0.5 ， activation = sigmoid （因为它是1层-既是输入层又是输出层。不涉及任何隐藏层。我读到的sigmoid比relu更适合于输出层）， cost function = squared error 。 Here are the results for different Epochs: 以下是不同时期的结果：

Epoch = 100,000. 纪元= 100,000。 Average Accuracy: 50.10541638874056 平均准确度：50.10541638874056

Epoch = 500,000. 纪元= 500,000 Average Accuracy: 50.08965597645948 平均准确度：50.08965597645948

Epoch = 1,000,000. 纪元= 1,000,000。 Average Accuracy: 97.56879179064482 平均准确度：97.56879179064482

Epoch = 7,500,000. 时期= 7,500,000。 Average Accuracy: 49.994692515332524 平均准确度：49.994692515332524

Epoch 750,000. 时代750,000。 Average Accuracy: 77.0028368954157 平均准确度：77.0028368954157

Epoch = 100. Average Accuracy: 48.96967591507596 纪元=100。平均准确度：48.96967591507596

Epoch = 500. Average Accuracy: 48.20721972881673 纪元=500。平均准确度：48.20721972881673

Epoch = 10,000. 纪元= 10,000。 Average Accuracy: 71.58066454336122 平均准确度：71.58066454336122

Epoch = 50,000. 纪元= 50,000 Average Accuracy: 62.52998222597177 平均准确度：62.52998222597177

Epoch = 100,000. 纪元= 100,000。 Average Accuracy: 49.813675726563424 平均准确度：49.813675726563424

Epoch = 1,000,000. 纪元= 1,000,000。 Average Accuracy: 49.993141329926374 平均准确度：49.993141329926374

As you can see there doesn't seem to be any clear pattern. 如您所见，似乎没有明确的模式。 I tried 1 million epochs and got 97.6% accuracy. 我尝试了100万个时代，并获得了97.6％的准确性。 Then I tried 7.5 million epochs got 50% accuracy. 然后，我尝试了750万个时代，准确率达到了50％。 Half a million epochs also got 50% accuracy. 五百万个纪元也获得了50％的准确性。 100 epochs resulted in 49% accuracy. 100个纪元导致49％的准确性。 Then the really odd one, tried 1 millions epochs again and got 50%. 然后是真正奇怪的一个，再次尝试了100万个时代，并获得了50％。

So I'm sharing my code below, because I don't believe the network is doing any learning. 因此，我在下面共享我的代码，因为我不相信网络在做任何学习。 Just seems like random guesses. 似乎只是随机猜测。 I applied the concept of Back-propagation and partial derivative to optimise the weights and bias. 我应用了反向传播和偏导数的概念来优化权重和偏差。 So I'm not sure where I'm going wrong with my code. 所以我不确定我的代码在哪里出问题。

2- One of the parameters I included in the parameter list of the simple_1_layer_classification_NN method, is the input_dimension parameter. 2-我包含在simple_1_layer_classification_NN方法的参数列表中的参数之一是input_dimension参数。 At first I thought it would be needed to workout the number of weights required for the input layer. 起初，我认为需要锻炼输入层所需的权重数。 Then I realised, as long as the dataset_input_matrix (matrix of features) argument is passed to the method, I can access a random index of the matrix to access a random observation vector from the matrix ( input_observation_vector = dataset_input_matrix[ri] ). 然后我意识到，只要将dataset_input_matrix （ dataset_input_matrix矩阵）参数传递给方法，就可以访问矩阵的随机索引以访问来自矩阵的随机观察向量（ input_observation_vector = dataset_input_matrix[ri] ）。 Then looping through the observation to access each feature. 然后遍历观察以访问每个功能。 The number of loops (or length) of the observation vector will tell me exactly how many weights are required (because each feature will require one weight (as its coefficient). So (len(input_observation_vector)) will tell me the number of weights required in the input layer, and therefore I don't need to ask the user to pass input_dimension argument to the method. So my question is simply, is there any need/reason to include a input_dimension parameter, when this can be worked out simply by evaluating the length of the observation vector from the input matrix? 观测向量的循环数（或长度）将准确地告诉我需要多少个权重（因为每个特征都需要一个权重（作为其系数）。因此， (len(input_observation_vector))可以告诉我所需的权重数在输入层中，因此，我不需要让用户将input_dimension参数传递给该方法。所以我的问题很简单，是否可以通过input_dimension方法来包含input_dimension参数？从输入矩阵评估观察向量的长度？

3 - When I try to plot the array of costs values, nothing shows up - plt.plot(y_costs) . 3-当我尝试绘制costs值数组时，没有显示up- plt.plot(y_costs) 。 A cost value (produced from every Epoch), is appended to the costs array only every 50 epochs. cost值（从每个纪元产生）仅每50个纪元附加到costs数组。 This is to avoid having so many cost elements added in the array if the number of epochs is really high. 如果时期数确实很高，则这是为了避免在数组中添加太多cost元素。 At line: 在行：

if i % 50 == 0:
          costs.append(cost)

When I did some debugging, I found that the costs array is empty, after the method returns. 当我进行一些调试时，发现方法返回后， costs数组为空。 I'm not sure why that is, when it should be appending a cost value every 50th epoch. 我不确定为什么要在第50个时间段附加cost值。 Probably I've overlooked something really silly that I can't see it. 可能我忽略了一个看不见的非常愚蠢的东西。

Many thanks in advance, and apologies again for the long piece of code. 在此先感谢许多人，并再次对冗长的代码表示歉意。


from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import sys
# import os

class NN_classification:

    def __init__(self):
        self.bias = float()
        self.weights = []
        self.chosen_activation_func = None
        self.chosen_cost_func = None
        self.train_average_accuracy = int()
        self.test_average_accuracy = int()

    # -- Activation functions --: 
    def sigmoid(x):
        return 1/(1 + np.exp(-x))

    def relu(x):
        return np.maximum(0.0, x)

    # -- Derivative of activation functions --:
    def sigmoid_derivation(x): 
        return NN_classification.sigmoid(x) * (1-NN_classification.sigmoid(x))

    def relu_derivation(x):
        if x <= 0:
            return 0
        else:
            return 1

    # -- Squared-error cost function --:
    def squared_error(pred, target):
        return np.square(pred - target)

    # -- Derivative of squared-error cost function --:
    def squared_error_derivation(pred, target):
        return 2 * (pred - target)

     # --- neural network structure diagram --- 

    #    O  output prediction
    #   / \   w1, w2, b
    #  O   O  datapoint 1, datapoint 2

    def simple_1_layer_classification_NN(self, dataset_input_matrix, output_data_labels, input_dimension, epochs, activation_func='sigmoid', learning_rate=0.2, cost_func='squared_error'):
        weights = []
        bias = int()
        cost = float()
        costs = []
        dCost_dWeights = []
        chosen_activation_func_derivation = None
        chosen_cost_func = None
        chosen_cost_func_derivation = None
        correct_pred = int()
        incorrect_pred = int()

        # store the chosen activation function to use to it later on in the activation calculation section and in the 'predict' method
        # Also the same goes for the derivation section.        
        if activation_func == 'sigmoid':
            self.chosen_activation_func = NN_classification.sigmoid
            chosen_activation_func_derivation = NN_classification.sigmoid_derivation
        elif activation_func == 'relu':
            self.chosen_activation_func = NN_classification.relu
            chosen_activation_func_derivation = NN_classification.relu_derivation
        else:
            print("Exception error - no activation function utilised, in training method", file=sys.stderr)
            return   

        # store the chosen cost function to use to it later on in the cost calculation section.
        # Also the same goes for the cost derivation section.    
        if cost_func == 'squared_error':
            chosen_cost_func = NN_classification.squared_error
            chosen_cost_func_derivation = NN_classification.squared_error_derivation
        else:
           print("Exception error - no cost function utilised, in training method", file=sys.stderr)
           return

        # Set initial network parameters (weights & bias):
        # Will initialise the weights to a uniform distribution and ensure the numbers are small close to 0.
        # We need to loop through all the weights to set them to a random value initially.
        for i in range(input_dimension):
            # create random numbers for our initial weights (connections) to begin with. 'rand' method creates small random numbers. 
            w = np.random.rand()
            weights.append(w)

        # create a random number for our initial bias to begin with.
        bias = np.random.rand()

        # We perform the training based on the number of epochs specified
        for i in range(epochs):
            # create random index
            ri = np.random.randint(len(dataset_input_matrix))
            # Pick random observation vector: pick a random observation vector of independent variables (x) from the dataset matrix
            input_observation_vector = dataset_input_matrix[ri]

            # reset weighted sum value at the beginning of every epoch to avoid incrementing the previous observations weighted-sums on top. 
            weighted_sum = 0

            # Loop through all the independent variables (x) in the observation
            for i in range(len(input_observation_vector)):
                # Weighted_sum: we take each independent variable in the entire observation, add weight to it then add it to the subtotal of weighted sum
                weighted_sum += input_observation_vector[i] * weights[i]

            # Add Bias: add bias to weighted sum
            weighted_sum += bias

            # Activation: process weighted_sum through activation function
            activation_func_output = self.chosen_activation_func(weighted_sum)    

            # Prediction: Because this is a single layer neural network, so the activation output will be the same as the prediction
            pred = activation_func_output

            # Cost: the cost function to calculate the prediction error margin
            cost = chosen_cost_func(pred, output_data_labels[ri])
            # Also calculate the derivative of the cost function with respect to prediction
            dCost_dPred = chosen_cost_func_derivation(pred, output_data_labels[ri])

            # Derivative: bringing derivative from prediction output with respect to the activation function used for the weighted sum.
            dPred_dWeightSum = chosen_activation_func_derivation(weighted_sum)

            # Bias is just a number on its own added to the weighted sum, so its derivative is just 1
            dWeightSum_dB = 1

            # The derivative of the Weighted Sum with respect to each weight is the input data point / independant variable it's multiplied by. 
            # Therefore I simply assigned the input data array to another variable I called 'dWeightedSum_dWeights'
            # to represent the array of the derivative of all the weights involved. I could've used the 'input_sample'
            # array variable itself, but for the sake of readibility, I created a separate variable to represent the derivative of each of the weights.
            dWeightedSum_dWeights = input_observation_vector

            # Derivative chaining rule: chaining all the derivative functions together (chaining rule)
            # Loop through all the weights to workout the derivative of the cost with respect to each weight:
            for dWeightedSum_dWeight in dWeightedSum_dWeights:
                dCost_dWeight = dCost_dPred * dPred_dWeightSum * dWeightedSum_dWeight
                dCost_dWeights.append(dCost_dWeight)

            dCost_dB = dCost_dPred * dPred_dWeightSum * dWeightSum_dB

            # Backpropagation: update the weights and bias according to the derivatives calculated above.
            # In other word we update the parameters of the neural network to correct parameters and therefore 
            # optimise the neural network prediction to be as accurate to the real output as possible
            # We loop through each weight and update it with its derivative with respect to the cost error function value. 
            for i in range(len(weights)):
                weights[i] = weights[i] - learning_rate * dCost_dWeights[i]

            bias = bias - learning_rate * dCost_dB

            # for each 50th loop we're going to get a summary of the
            # prediction compared to the actual ouput
            # to see if the prediction is as expected.
            # Anything in prediction above 0.5 should match value 
            # 1 of the actual ouptut. Any prediction below 0.5 should
            # match value of 0 for actual output 
            if i % 50 == 0:
                costs.append(cost)

            # Compare prediction to target
            error_margin = np.sqrt(np.square(pred - output_data_labels[ri]))
            accuracy = (1 - error_margin) * 100
            self.train_average_accuracy += accuracy

            # Evaluate whether guessed correctly or not based on classification binary problem 0 or 1 outcome. So if prediction is above 0.5 it guessed 1 and below 0.5 it guessed incorrectly. If it's dead on 0.5 it is incorrect for either guesses. Because it's no exactly a good guess for either 0 or 1. We need to set a good standard for the neural net model.
            if (error_margin < 0.5) and (error_margin >= 0):
                correct_pred += 1 
            elif (error_margin >= 0.5) and (error_margin <= 1):
                incorrect_pred += 1
            else:
                print("Exception error - 'margin error' for 'predict' method is out of range. Must be between 0 and 1, in training method", file=sys.stderr)
                return
        # store the final optimised weights to the weights instance variable so it can be used in the predict method.
        self.weights = weights

        # store the final optimised bias to the weights instance variable so it can be used in the predict method.
        self.bias = bias

        # Calculate average accuracy from the predictions of all obervations in the training dataset
        self.train_average_accuracy /= epochs

        # Print out results 
        print('Average Accuracy: {}'.format(self.train_average_accuracy))
        print('Correct predictions: {}, Incorrect Predictions: {}'.format(correct_pred, incorrect_pred))
        print('costs = {}'.format(costs))
        y_costs = np.array(costs)
        plt.plot(y_costs)
        plt.show()

from numpy import array
#define array of dataset
# each observation vector has 3 datapoints or 3 columns: length, width, and outcome label (0, 1 to represent blue flower and red flower respectively).  
data = array([[3,   1.5, 1],
        [2,   1,   0],
        [4,   1.5, 1],
        [3,   1,   0],
        [3.5, 0.5, 1],
        [2,   0.5, 0],
        [5.5, 1,   1],
        [1,   1,   0]])

# separate data: split input, output, train and test data.
X_train, y_train, X_test, y_test = data[:6, :-1], data[:6, -1], data[6:, :-1], data[6:, -1]

nn_model = NN_classification()

nn_model.simple_1_layer_classification_NN(X_train, y_train, 2, 1000000, learning_rate=0.5)

Answer 1

Have you tried a smaller learning rate? 您是否尝试过降低学习率？ Your network may be skipping over local minima because it is too high. 您的网络过高，可能会跳过本地最小值。

Here's an article that goes more in-depth on learning rates: https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10 这是一篇关于学习率的文章： https : //towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10

The reason that the cost is never getting appended is because you are using the same variable, 'i', within nested for loops. 永远不会增加成本的原因是因为您在嵌套的for循环中使用了相同的变量“ i”。

# We perform the training based on the number of epochs specified
    for i in range(epochs):
        # create random index
        ri = np.random.randint(len(dataset_input_matrix))
        # Pick random observation vector: pick a random observation vector of independent variables (x) from the dataset matrix
        input_observation_vector = dataset_input_matrix[ri]

        # reset weighted sum value at the beginning of every epoch to avoid incrementing the previous observations weighted-sums on top.
        weighted_sum = 0

        # Loop through all the independent variables (x) in the observation
        for i in range(len(input_observation_vector)):
            # Weighted_sum: we take each independent variable in the entire observation, add weight to it then add it to the subtotal of weighted sum
            weighted_sum += input_observation_vector[i] * weights[i]

        # Add Bias: add bias to weighted sum
        weighted_sum += bias

        # Activation: process weighted_sum through activation function
        activation_func_output = self.chosen_activation_func(weighted_sum)

        # Prediction: Because this is a single layer neural network, so the activation output will be the same as the prediction
        pred = activation_func_output

        # Cost: the cost function to calculate the prediction error margin
        cost = chosen_cost_func(pred, output_data_labels[ri])
        # Also calculate the derivative of the cost function with respect to prediction
        dCost_dPred = chosen_cost_func_derivation(pred, output_data_labels[ri])

        # Derivative: bringing derivative from prediction output with respect to the activation function used for the weighted sum.
        dPred_dWeightSum = chosen_activation_func_derivation(weighted_sum)

        # Bias is just a number on its own added to the weighted sum, so its derivative is just 1
        dWeightSum_dB = 1

        # The derivative of the Weighted Sum with respect to each weight is the input data point / independant variable it's multiplied by.
        # Therefore I simply assigned the input data array to another variable I called 'dWeightedSum_dWeights'
        # to represent the array of the derivative of all the weights involved. I could've used the 'input_sample'
        # array variable itself, but for the sake of readibility, I created a separate variable to represent the derivative of each of the weights.
        dWeightedSum_dWeights = input_observation_vector

        # Derivative chaining rule: chaining all the derivative functions together (chaining rule)
        # Loop through all the weights to workout the derivative of the cost with respect to each weight:
        for dWeightedSum_dWeight in dWeightedSum_dWeights:
            dCost_dWeight = dCost_dPred * dPred_dWeightSum * dWeightedSum_dWeight
            dCost_dWeights.append(dCost_dWeight)

        dCost_dB = dCost_dPred * dPred_dWeightSum * dWeightSum_dB

        # Backpropagation: update the weights and bias according to the derivatives calculated above.
        # In other word we update the parameters of the neural network to correct parameters and therefore
        # optimise the neural network prediction to be as accurate to the real output as possible
        # We loop through each weight and update it with its derivative with respect to the cost error function value.
        for i in range(len(weights)):
            weights[i] = weights[i] - learning_rate * dCost_dWeights[i]

        bias = bias - learning_rate * dCost_dB

        # for each 50th loop we're going to get a summary of the
        # prediction compared to the actual ouput
        # to see if the prediction is as expected.
        # Anything in prediction above 0.5 should match value
        # 1 of the actual ouptut. Any prediction below 0.5 should
        # match value of 0 for actual output

This was causing 'i' to always be 1 when it got to the if statement 到if语句时，这导致“ i”始终为1

        if i % 50 == 0:
            costs.append(cost)

        # Compare prediction to target
        error_margin = np.sqrt(np.square(pred - output_data_labels[ri]))
        accuracy = (1 - error_margin) * 100
        self.train_average_accuracy += accuracy

Edit 编辑

So I tried training the model 1000 times with random learning rates between 0 and 1, and the initial learning rate doesn't seem to make any difference. 因此，我尝试使用0到1之间的随机学习率对模型进行1000次训练，而初始学习率似乎没有任何区别。 0.3% of these achieved accuracies above 0.60, and none of them were above 70%. 其中有0.3％的精度达到0.60以上，没有一个达到70％以上。 Then I ran the same test with an adaptive learning rate: 然后，我以自适应学习率运行了相同的测试：

# Modify the learning rate based on the cost
# Placed just before the bias is calculated
learning_rate = 0.999 * learning_rate + 0.1 * cost

This is resulting in about 10-12% of the models having an accuracy above 60%, and about 2.5% of them are above 70% 这导致大约10-12％的模型的准确性高于60％，其中大约2.5％的模型的准确性高于70％

改善简单的1层神经网络

问题描述

1 个解决方案

解决方案1
0 2019-04-10 20:14:21

改善简单的1层神经网络

问题描述

1 个解决方案

解决方案1 0 2019-04-10 20:14:21

解决方案1
0 2019-04-10 20:14:21