[英]Neural Network seems to be getting stuck on a single output with each execution
我創建了一個神經網絡來估計輸入x
的sin(x)
函數。 該網絡有 21 個輸出神經元(代表數字 -1.0、-0.9、...、0.9、1.0),其中 numpy 無法學習,因為我認為在定義前饋機制時我錯誤地實現了神經元架構。
當我執行代碼時,它正確估計的測試數據量大約為 48/1000。 如果您在 21 個類別之間拆分 1000 個測試數據點,這恰好是每個類別的平均數據點數。 查看網絡輸出,您可以看到網絡似乎只是開始為每個輸入選擇一個輸出值。 例如,它可能會選擇 -0.5 作為y
的估計值,而不管您給它的x
。 我這里哪里出錯了? 這是我的第一個網絡。 謝謝!
import random
import numpy as np
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
# print(self.layer1)
# print()
# print(self.layer2)
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Propogate forward through network as if doing this by hand.
#first layer's output activations:
for neuron in range(self.hiddenLayerSize):
self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
def train(self, training_pairs, epochs, minibatchsize, learn_rate):
#apply gradient descent
test_data = build_sinx_data(1000)
for epoch in range(epochs):
random.shuffle(training_pairs)
minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
for minibatch in minibatches:
loss = 0 #calculate loss for each minibatch
#Begin training
for x, y in minibatch:
network_output = self.feedforward(x)
loss += (network_output - y) ** 2
#adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
loss /= (2*len(minibatch))
adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
self.layer1 += adjustWeights
#print(adjustWeights)
self.layer2 += adjustWeights
#when line 63 placed here, results did not improve during minibatch.
print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
print("Training Complete")
def evaluate(self, test_data):
"""
Returns number of test inputs which network evaluates correctly.
The ouput assumed to be neuron in output layer with highest activation
:param test_data: test data set identical in form to train data set.
:return: integer sum
"""
correct = 0
for x, y in test_data:
output = self.feedforward(x)
if output == y:
correct+=1
return(correct)
def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
#parameter of randint signifies range of x values to be used*10
x_vals.append(random.randint(-2000,2000)/10)
y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate
sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))
我沒有徹底檢查你的所有代碼,但有些問題很明顯:
*
運算符不在numpy 中執行矩陣乘法,您必須使用numpy.dot
。 例如,這會影響以下network_input * self.layer1[neuron]
行: network_input * self.layer1[neuron]
、 self.layer1_activations[weight]*self.layer2[neuron][weight]
等。
似乎您正在通過分類(從 21 個類別中選擇 1 個)解決問題,但使用 L2 損失。 這有點混亂。 您有兩種選擇:要么堅持分類並使用交叉熵損失函數,要么使用 L2 損失進行回歸(即預測數值)。
您絕對應該提取sigmoid
函數以避免再次編寫相同的表達式:
def sigmoid(z): return 1 / (1 + np.exp(-z)) def sigmoid_derivative(x): return sigmoid(x) * (1 - sigmoid(x))
您對self.layer1
和self.layer2
執行相同的更新,這顯然是錯誤的。 花一些時間分析反向傳播的工作原理。
我編輯了如何將我的損失函數集成到我的函數中,並正確實現了梯度下降。 我還刪除了小批量的使用並簡化了我的網絡嘗試做的事情。 我現在有一個網絡,它試圖將事物分類為偶數或奇數。
我用來解決問題的一些非常有用的指南:
神經網絡和深度學習的第 1 章和第 2 章,作者 Michael Nielsen,可在http://neuralnetworksanddeeplearning.com/chap1.html免費獲得。 本書詳細解釋了神經網絡的工作原理,包括其執行背后的數學分解。
從一開始的反向傳播,由 Erik Hallström 撰寫,由 Maxim 鏈接。 https://medium.com/@erikhallstrm/backpropagation-from-the-beginning-77356edf427d 。 不像上面的指南那么徹底,但我同時打開了這兩個指南,因為本指南更重要的是關於什么是重要的,以及如何應用Nielsen 書中詳盡解釋的數學公式。
如何用 9 行 Python 代碼構建一個簡單的神經網絡https://medium.com/technology-invention-and-more/how-to-build-a-simple-neural-network-in-9-lines-of -python-code-cc8f23647ca1 。 對一些神經網絡基礎知識的有用且快速的介紹。
這是我的(現在正在運行)代碼:
import random
import numpy as np
import scipy
import math
class Network(object):
def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):
#Layers represented both by their weights array and activation and inputsums vectors.
self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
self.layer1_activations = np.zeros((hiddenLayerSize, 1))
self.layer2_activations = np.zeros((outputLayerSize, 1))
self.layer1_inputsums = np.zeros((hiddenLayerSize, 1))
self.layer2_inputsums = np.zeros((outputLayerSize, 1))
self.layer1_errorsignals = np.zeros((hiddenLayerSize, 1))
self.layer2_errorsignals = np.zeros((outputLayerSize, 1))
self.layer1_deltaw = np.zeros((hiddenLayerSize, inputLayerSize))
self.layer2_deltaw = np.zeros((outputLayerSize, hiddenLayerSize))
self.outputLayerSize = outputLayerSize
self.inputLayerSize = inputLayerSize
self.hiddenLayerSize = hiddenLayerSize
print()
print(self.layer1)
print()
print(self.layer2)
print()
# self.weights = [np.random.randn(y,x)
# for x, y in zip(sizes[:-1], sizes[1:])]
def feedforward(self, network_input):
#Calculate inputsum and and activations for each neuron in the first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_inputsums[neuron] = network_input * self.layer1[neuron]
self.layer1_activations[neuron] = self.sigmoid(self.layer1_inputsums[neuron])
# Calculate inputsum and and activations for each neuron in the second layer. Notice that each neuron in the second layer represented by
# weights vector, consisting of all weights leading out of the kth neuron in (l-1) layer to the jth neuron in layer l.
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = self.sigmoid(self.layer2_inputsums[neuron])
return self.layer2_activations
def interpreted_output(self, network_input):
#convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
self.feedforward(network_input)
outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)
return(outputs[np.argmax(self.layer2_activations)])
# def build_expected_output(self, training_data):
# #Views expected output number y for each x to generate an expected output vector from the network
# index=0
# for pair in training_data:
# expected_output_vector = np.zeros((self.outputLayerSize,1))
# x = training_data[0]
# y = training_data[1]
# for i in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1, 1):
# if y == i / 10:
# expected_output_vector[i] = 1
# #expect the target category to be a 1.
# break
# training_data[index][1] = expected_output_vector
# index+=1
# return training_data
def train(self, training_data, learn_rate):
self.backpropagate(training_data, learn_rate)
def backpropagate(self, train_data, learn_rate):
#Perform for each x,y pair.
for datapair in range(len(train_data)):
x = train_data[datapair][0]
y = train_data[datapair][1]
self.feedforward(x)
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l2 " + str(self.layer2))
# print("l1 " + str(self.layer1))
for neuron in range(self.outputLayerSize):
#Calculate first error equation for error signals of output layer neurons
self.layer2_errorsignals[neuron] = (self.layer2_activations[neuron] - y[neuron]) * self.sigmoid_prime(self.layer2_inputsums[neuron])
#Use recursive formula to calculate error signals of hidden layer neurons
self.layer1_errorsignals = np.multiply(np.array(np.matrix(self.layer2.T) * np.matrix(self.layer2_errorsignals)) , self.sigmoid_prime(self.layer1_inputsums))
#print(self.layer1_errorsignals)
# for neuron in range(self.hiddenLayerSize):
# #Use recursive formula to calculate error signals of hidden layer neurons
# self.layer1_errorsignals[neuron] = np.multiply(self.layer2[neuron].T,self.layer2_errorsignals[neuron]) * self.sigmoid_prime(self.layer1_inputsums[neuron])
#Partial derivative of C with respect to weight for connection from kth neuron in (l-1)th layer to jth neuron in lth layer is
#(jth error signal in lth layer) * (kth activation in (l-1)th layer.)
#Update all weights for network at each iteration of a training pair.
#Update weights in second layer
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_deltaw[neuron][weight] = self.layer2_errorsignals[neuron]*self.layer1_activations[weight]*(-learn_rate)
self.layer2 += self.layer2_deltaw
#Update weights in first layer
for neuron in range(self.hiddenLayerSize):
self.layer1_deltaw[neuron] = self.layer1_errorsignals[neuron]*(x)*(-learn_rate)
self.layer1 += self.layer1_deltaw
#Comment/Uncomment to enable error evaluation.
#print("Epoch {0}: Error: {1}".format(datapair, self.evaluate(test_data)))
# print("l2a " + str(self.layer2_activations))
# print("l1a " + str(self.layer1_activations))
# print("l1 " + str(self.layer1))
# print("l2 " + str(self.layer2))
def evaluate(self, test_data):
error = 0
for x, y in test_data:
#x is integer, y is single element np.array
output = self.feedforward(x)
error += y - output
return error
#eval function for sin(x)
# def evaluate(self, test_data):
# """
# Returns number of test inputs which network evaluates correctly.
# The ouput assumed to be neuron in output layer with highest activation
# :param test_data: test data set identical in form to train data set.
# :return: integer sum
# """
# correct = 0
# for x, y in test_data:
# outputs = [x / 10 for x in range(-int((self.outputLayerSize / 2)), int((self.outputLayerSize / 2)) + 1,
# 1)] # range(-10, 11, 1)
# newy = outputs[np.argmax(y)]
# output = self.interpreted_output(x)
# #print("output: " + str(output))
# if output == newy:
# correct+=1
# return(correct)
def sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def sigmoid_prime(self, z):
return (1 - self.sigmoid(z)) * self.sigmoid(z)
def build_simple_data(data_points):
x_vals = []
y_vals = []
for each in range(data_points):
x = random.randint(-3,3)
expected_output_vector = np.zeros((1, 1))
if x > 0:
expected_output_vector[[0]] = 1
else:
expected_output_vector[[0]] = 0
x_vals.append(x)
y_vals.append(expected_output_vector)
print(list(zip(x_vals,y_vals)))
print()
return (list(zip(x_vals,y_vals)))
simpleNet = Network(1, 3, 1)
# print("Pretest")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
# init_weights_l1 = simpleNet.layer1
# init_weights_l2 = simpleNet.layer2
# simpleNet.train(build_simple_data(10000),.1)
# #sometimes Error converges to 0, sometimes error converges to 10.
# print("Initial Weights:")
# print(init_weights_l1)
# print(init_weights_l2)
# print("Final Weights")
# print(simpleNet.layer1)
# print(simpleNet.layer2)
# print("Post-test")
# print(simpleNet.feedforward(-3))
# print(simpleNet.feedforward(10))
def test_network(iterations,net,training_points):
"""
Casually evaluates pre and post test
:param iterations: number of trials to be run
:param net: name of network to evaluate.
;param training_points: size of training data to be used
:return: four 1x1 arrays.
"""
pretest_negative = 0
pretest_positive = 0
posttest_negative = 0
posttest_positive = 0
for each in range(iterations):
pretest_negative += net.feedforward(-10)
pretest_positive += net.feedforward(10)
net.train(build_simple_data(training_points),.1)
for each in range(iterations):
posttest_negative += net.feedforward(-10)
posttest_positive += net.feedforward(10)
return(pretest_negative/iterations, pretest_positive/iterations, posttest_negative/iterations, posttest_positive/iterations)
print(test_network(10000, simpleNet, 10000))
雖然此代碼與 OP 中發布的代碼有很大不同,但有一個特別有趣的差異。 在原前饋方法通知中
#second layer's output activations use layer1's activations as input:
for neuron in range(self.outputLayerSize):
for weight in range(self.hiddenLayerSize):
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))
線
self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
長得像
self.layer2_inputsums[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
在更新的代碼中。 這條線在每個權重向量和每個輸入向量(來自第 1 層的激活)之間執行點積,以獲得神經元的 input_sum,通常稱為 z(想想 sigmoid(z))。 在我的網絡中,sigmoid 函數的導數 sigmoid_prime 用於計算成本函數相對於所有權重的梯度。 通過將 sigmoid_prime(z) * 實際輸出和預期輸出之間的網絡誤差相乘。 如果 z 非常大(並且為正),則神經元的激活值將非常接近 1。這意味着網絡確信該神經元應該被激活。 如果 z 非常負,情況也是如此。 然后,網絡不想從根本上調整它滿意的權重,因此神經元每個權重的變化范圍由 sigmoid(z) 和 sigmoid_prime(z) 的梯度給出。 非常大的 z 意味着非常小的梯度和應用於權重的非常小的變化(sigmoid 的梯度在 z = 0 時最大化,當網絡不確定神經元應該如何分類並且該神經元的激活為 0.5 時)。
由於我不斷增加每個神經元的 input_sum (z) 並且從不重置 dot(weights, activations) 的新輸入的值,z 的值不斷增長,不斷減慢權重的變化率,直到權重修改增長到停頓。 我添加了以下行來解決這個問題:
self.layer2_inputsums = np.zeros((self.outputLayerSize, 1))
只要您安裝了 numpy 模塊,就可以將新發布的網絡復制並粘貼到編輯器中並執行。 要打印的最后一行輸出將是代表最終網絡輸出的 4 個數組的列表。 前兩個分別是負輸入和正輸入的預測試值。 這些應該是隨機的。 后兩個是測試后值,用於確定網絡分類為正數和負數的程度。 接近 0 的數字表示負數,接近 1 的數表示正數。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.