簡體   English   中英

每次執行時,神經網絡似乎都卡在單個輸出上

[英]Neural Network seems to be getting stuck on a single output with each execution

我創建了一個神經網絡來估計輸入xsin(x)函數。 該網絡有 21 個輸出神經元(代表數字 -1.0、-0.9、...、0.9、1.0),其中 numpy 無法學習,因為我認為在定義前饋機制時我錯誤地實現了神經元架構。

當我執行代碼時,它正確估計的測試數據量大約為 48/1000。 如果您在 21 個類別之間拆分 1000 個測試數據點,這恰好是每個類別的平均數據點數。 查看網絡輸出,您可以看到網絡似乎只是開始為每個輸入選擇一個輸出值。 例如,它可能會選擇 -0.5 作為y的估計值,而不管您給它的x 我這里哪里出錯了? 這是我的第一個網絡。 謝謝!

import random
import numpy as np
import math
class Network(object):

def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):

    #Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
    self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
    self.layer1_activations = np.zeros((hiddenLayerSize, 1))
    self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
    self.layer2_activations = np.zeros((outputLayerSize, 1))

    self.outputLayerSize = outputLayerSize
    self.inputLayerSize = inputLayerSize
    self.hiddenLayerSize = hiddenLayerSize

    # print(self.layer1)
    # print()
    # print(self.layer2)

    # self.weights = [np.random.randn(y,x)
    #                 for x, y in zip(sizes[:-1], sizes[1:])]

def feedforward(self, network_input):

    #Propogate forward through network as if doing this by hand.
    #first layer's output activations:
    for neuron in range(self.hiddenLayerSize):
        self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))

    #second layer's output activations use layer1's activations as input:
    for neuron in range(self.outputLayerSize):
        for weight in range(self.hiddenLayerSize):
            self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
        self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))


    #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
    outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)

    return(outputs[np.argmax(self.layer2_activations)])

def train(self, training_pairs, epochs, minibatchsize, learn_rate):
    #apply gradient descent
    test_data = build_sinx_data(1000)
    for epoch in range(epochs):
        random.shuffle(training_pairs)
        minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
        for minibatch in minibatches:
            loss = 0 #calculate loss for each minibatch

            #Begin training
            for x, y in minibatch:
                network_output = self.feedforward(x)
                loss += (network_output - y) ** 2
                #adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
            loss /= (2*len(minibatch))
            adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
            self.layer1 += adjustWeights
            #print(adjustWeights)
            self.layer2 += adjustWeights
            #when line 63 placed here, results did not improve during minibatch.
        print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
    print("Training Complete")

def evaluate(self, test_data):
    """
    Returns number of test inputs which network evaluates correctly.
    The ouput assumed to be neuron in output layer with highest activation
    :param test_data: test data set identical in form to train data set.
    :return: integer sum
    """
    correct = 0
    for x, y in test_data:
        output = self.feedforward(x)
        if output == y:
            correct+=1
    return(correct)

def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
    #parameter of randint signifies range of x values to be used*10
    x_vals.append(random.randint(-2000,2000)/10)
    y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate

sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))

我沒有徹底檢查你的所有代碼,但有些問題很明顯:

  • *運算符不在numpy 中執行矩陣乘法,您必須使用numpy.dot 例如,這會影響以下network_input * self.layer1[neuron]行: network_input * self.layer1[neuron]self.layer1_activations[weight]*self.layer2[neuron][weight]等。

  • 似乎您正在通過分類(從 21 個類別中選擇 1 個)解決問題,但使用 L2 損失。 這有點混亂。 您有兩種選擇:要么堅持分類並使用交叉熵損失函數,要么使用 L2 損失進行回歸(即預測數值)。

  • 您絕對應該提取sigmoid函數以避免再次編寫相同的表達式:

     def sigmoid(z): return 1 / (1 + np.exp(-z)) def sigmoid_derivative(x): return sigmoid(x) * (1 - sigmoid(x))
  • 您對self.layer1self.layer2執行相同的更新,這顯然是錯誤的。 花一些時間分析反向傳播的工作原理。

我編輯了如何將我的損失函數集成到我的函數中,並正確實現了梯度下降。 我還刪除了小批量的使用並簡化了我的網絡嘗試做的事情。 我現在有一個網絡,它試圖將事物分類為偶數或奇數。

我用來解決問題的一些非常有用的指南:

神經網絡和深度學習的第 1 章和第 2 章,作者 Michael Nielsen,可在http://neuralnetworksanddeeplearning.com/chap1.html免費獲得。 本書詳細解釋了神經網絡的工作原理,包括其執行背后的數學分解。

從一開始的反向傳播,由 Erik Hallström 撰寫,由 Maxim 鏈接。 https://medium.com/@erikhallstrm/backpropagation-from-the-beginning-77356edf427d 不像上面的指南那么徹底,但我同時打開了這兩個指南,因為本指南更重要的是關於什么是重要的,以及如何應用Nielsen 書中詳盡解釋的數學公式。

如何用 9 行 Python 代碼構建一個簡單的神經網絡https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of -python-code-cc8f23647ca1 對一些神經網絡基礎知識的有用且快速的介紹。

這是我的(現在正在運行)代碼:

import random
import numpy as np
import scipy
import math
class Network(object):

    def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):

        #Layers represented both by their weights array and activation and inputsums vectors.
        self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
        self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)

        self.layer1_activations = np.zeros((hiddenLayerSize, 1))
        self.layer2_activations = np.zeros((outputLayerSize, 1))

        self.layer1_inputsums = np.zeros((hiddenLayerSize, 1))
        self.layer2_inputsums = np.zeros((outputLayerSize, 1))

        self.layer1_errorsignals = np.zeros((hiddenLayerSize, 1))
        self.layer2_errorsignals = np.zeros((outputLayerSize, 1))

        self.layer1_deltaw = np.zeros((hiddenLayerSize, inputLayerSize))
        self.layer2_deltaw = np.zeros((outputLayerSize, hiddenLayerSize))

        self.outputLayerSize = outputLayerSize
        self.inputLayerSize = inputLayerSize
        self.hiddenLayerSize = hiddenLayerSize
        print()
        print(self.layer1)
        print()
        print(self.layer2)
        print()
        # self.weights = [np.random.randn(y,x)
        #                 for x, y in zip(sizes[:-1], sizes[1:])]

    def feedforward(self, network_input):
        #Calculate inputsum and and activations for each neuron in the first layer
        for neuron in range(self.hiddenLayerSize):
            self.layer1_inputsums[neuron] = network_input * self.layer1[neuron]
            self.layer1_activations[neuron] = self.sigmoid(self.layer1_inputsums[neuron])

        # Calculate inputsum and and activations for each neuron in the second layer. Notice that each neuron in the second layer represented by
        # weights vector, consisting of all weights leading out of the kth neuron in (l-1) layer to the jth neuron in layer l.
        self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
        for neuron in range(self.outputLayerSize):
            for weight in range(self.hiddenLayerSize):
                self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
            self.layer2_activations[neuron] = self.sigmoid(self.layer2_inputsums[neuron])

        return self.layer2_activations

    def interpreted_output(self, network_input):
        #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
        self.feedforward(network_input)
        outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
        return(outputs[np.argmax(self.layer2_activations)])

    # def build_expected_output(self, training_data):
    #     #Views expected output number y for each x to generate an expected output vector from the network
    #     index=0
    #     for pair in training_data:
    #         expected_output_vector = np.zeros((self.outputLayerSize,1))
    #         x = training_data[0]
    #         y = training_data[1]
    #         for i in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1, 1):
    #             if y == i / 10:
    #                 expected_output_vector[i] = 1
    #                 #expect the target category to be a 1.
    #                 break
    #         training_data[index][1] = expected_output_vector
    #         index+=1
    #     return training_data

    def train(self, training_data, learn_rate):
        self.backpropagate(training_data, learn_rate)

    def backpropagate(self, train_data, learn_rate):
        #Perform for each x,y pair.
        for datapair in range(len(train_data)):
            x = train_data[datapair][0]
            y = train_data[datapair][1]
            self.feedforward(x)
           # print("l2a " + str(self.layer2_activations))
           # print("l1a " + str(self.layer1_activations))
           # print("l2 " + str(self.layer2))
           # print("l1 " + str(self.layer1))
            for neuron in range(self.outputLayerSize):
                #Calculate first error equation for error signals of output layer neurons
                self.layer2_errorsignals[neuron] = (self.layer2_activations[neuron] - y[neuron]) * self.sigmoid_prime(self.layer2_inputsums[neuron])


            #Use recursive formula to calculate error signals of hidden layer neurons
            self.layer1_errorsignals = np.multiply(np.array(np.matrix(self.layer2.T) * np.matrix(self.layer2_errorsignals)) , self.sigmoid_prime(self.layer1_inputsums))
            #print(self.layer1_errorsignals)
            # for neuron in range(self.hiddenLayerSize):
            #     #Use recursive formula to calculate error signals of hidden layer neurons
            #     self.layer1_errorsignals[neuron] = np.multiply(self.layer2[neuron].T,self.layer2_errorsignals[neuron]) * self.sigmoid_prime(self.layer1_inputsums[neuron])

            #Partial derivative of C with respect to weight for connection from kth neuron in (l-1)th layer to jth neuron in lth layer is
            #(jth error signal in lth layer) * (kth activation in (l-1)th layer.)
            #Update all weights for network at each iteration of a training pair.

            #Update weights in second layer
            for neuron in range(self.outputLayerSize):
                for weight in range(self.hiddenLayerSize):
                    self.layer2_deltaw[neuron][weight] = self.layer2_errorsignals[neuron]*self.layer1_activations[weight]*(-learn_rate)

            self.layer2 += self.layer2_deltaw

            #Update weights in first layer
            for neuron in range(self.hiddenLayerSize):
                self.layer1_deltaw[neuron] = self.layer1_errorsignals[neuron]*(x)*(-learn_rate)

            self.layer1 += self.layer1_deltaw
            #Comment/Uncomment to enable error evaluation.
            #print("Epoch {0}: Error: {1}".format(datapair, self.evaluate(test_data)))
            # print("l2a " + str(self.layer2_activations))
            # print("l1a " + str(self.layer1_activations))
            # print("l1 " + str(self.layer1))
            # print("l2 " + str(self.layer2))



    def evaluate(self, test_data):
        error = 0
        for x, y in test_data:
            #x is integer, y is single element np.array
            output = self.feedforward(x)
            error += y - output
        return error


#eval function for sin(x)
    # def evaluate(self, test_data):
    #     """
    #     Returns number of test inputs which network evaluates correctly.
    #     The ouput assumed to be neuron in output layer with highest activation
    #     :param test_data: test data set identical in form to train data set.
    #     :return: integer sum
    #     """
    #     correct = 0
    #     for x, y in test_data:
    #         outputs = [x / 10 for x in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1,
    #                                          1)]  # range(-10, 11, 1)
    #         newy = outputs[np.argmax(y)]
    #         output = self.interpreted_output(x)
    #         #print("output: " + str(output))
    #         if output == newy:
    #             correct+=1
    #     return(correct)

    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))

    def sigmoid_prime(self, z):
        return (1 - self.sigmoid(z)) * self.sigmoid(z)

def build_simple_data(data_points):
    x_vals = []
    y_vals = []
    for each in range(data_points):
        x = random.randint(-3,3)
        expected_output_vector = np.zeros((1, 1))
        if x > 0:
            expected_output_vector[[0]] = 1
        else:
            expected_output_vector[[0]] = 0

        x_vals.append(x)
        y_vals.append(expected_output_vector)
    print(list(zip(x_vals,y_vals)))
    print()
    return (list(zip(x_vals,y_vals)))


simpleNet = Network(1, 3, 1)
# print("Pretest")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
# init_weights_l1 = simpleNet.layer1
# init_weights_l2 = simpleNet.layer2
# simpleNet.train(build_simple_data(10000),.1)
# #sometimes Error converges to 0, sometimes error converges to 10.
# print("Initial Weights:")
# print(init_weights_l1)
# print(init_weights_l2)
# print("Final Weights")
# print(simpleNet.layer1)
# print(simpleNet.layer2)
# print("Post-test")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))

def test_network(iterations,net,training_points):
    """
    Casually evaluates pre and post test
    :param iterations: number of trials to be run
    :param net: name of network to evaluate.
    ;param training_points: size of training data to be used
    :return: four 1x1 arrays.
    """
    pretest_negative = 0
    pretest_positive = 0
    posttest_negative = 0
    posttest_positive = 0
    for each in range(iterations):
        pretest_negative += net.feedforward(-10)
        pretest_positive += net.feedforward(10)
    net.train(build_simple_data(training_points),.1)
    for each in range(iterations):
        posttest_negative += net.feedforward(-10)
        posttest_positive += net.feedforward(10)
    return(pretest_negative/iterations, pretest_positive/iterations, posttest_negative/iterations, posttest_positive/iterations)

print(test_network(10000, simpleNet, 10000))

雖然此代碼與 OP 中發布的代碼有很大不同,但有一個特別有趣的差異。 在原前饋方法通知中

 #second layer's output activations use layer1's activations as input:
    for neuron in range(self.outputLayerSize):
        for weight in range(self.hiddenLayerSize):
            self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
        self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))

self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]

長得像

self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]

在更新的代碼中。 這條線在每個權重向量和每個輸入向量(來自第 1 層的激活)之間執行點積,以獲得神經元的 input_sum,通常稱為 z(想想 sigmoid(z))。 在我的網絡中,sigmoid 函數的導數 sigmoid_prime 用於計算成本函數相對於所有權重的梯度。 通過將 sigmoid_prime(z) * 實際輸出和預期輸出之間的網絡誤差相乘。 如果 z 非常大(並且為正),則神經元的激活值將非常接近 1。這意味着網絡確信該神經元應該被激活。 如果 z 非常負,情況也是如此。 然后,網絡不想從根本上調整它滿意的權重,因此神經元每個權重的變化范圍由 sigmoid(z) 和 sigmoid_prime(z) 的梯度給出。 非常大的 z 意味着非常小的梯度和應用於權重的非常小的變化(sigmoid 的梯度在 z = 0 時最大化,當網絡不確定神經元應該如何分類並且該神經元的激活為 0.5 時)。

由於我不斷增加每個神經元的 input_sum (z) 並且從不重置 dot(weights, activations) 的新輸入的值,z 的值不斷增長,不斷減慢權重的變化率,直到權重修改增長到停頓。 我添加了以下行來解決這個問題:

self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))

只要您安裝了 numpy 模塊,就可以將新發布的網絡復制並粘貼到編輯器中並執行。 要打印的最后一行輸出將是代表最終網絡輸出的 4 個數組的列表。 前兩個分別是負輸入和正輸入的預測試值。 這些應該是隨機的。 后兩個是測試后值,用於確定網絡分類為正數和負數的程度。 接近 0 的數字表示負數,接近 1 的數表示正數。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM