試圖用Python編寫我自己的神經網絡

Question

上學期我參加了由Ng教授講授的斯坦福大學的在線機器學習課程。 http://www.ml-class.org/course/auth/welcome我認為它非常有用。 為了更好地了解/理解神經網絡，我嘗試在python中編寫自己的神經網絡。 這里是：

import numpy

class NN:

    def __init__(self, sl):

        #sl = number of units (not counting bias unit) in layer l
        self.sl = sl
        self.layers = len(sl)

        #Create weights
        self.weights = []
        for idx in range(1, self.layers):
            self.weights.append(numpy.matrix(numpy.random.rand(self.sl[idx-1]+1, self.sl[idx])/5))

        self.cost = []

    def update(self, input):

        if input.shape[1] != self.sl[0]:
            raise ValueError, 'The first layer must have a node for every feature'

        self.z = []
        self.a = []

        #Input activations.  I'm expecting inputs as numpy matrix (Examples x Featrues) 
        self.a.append(numpy.hstack((numpy.ones((input.shape[0], 1)), input)))#Set inputs ai + bias unit

        #Hidden activations
        for weight in self.weights:         
            self.z.append(self.a[-1]*weight)
            self.a.append(numpy.hstack((numpy.ones((self.z[-1].shape[0], 1)), numpy.tanh(self.z[-1])))) #tanh is a fancy sigmoid

        #Output activation
        self.a[-1] = self.z[-1] #Not logistic regression thus no sigmoid function
        del self.z[-1]

    def backPropagate(self, targets, lamda):

        m = float(targets.shape[0]) #m is number of examples

        #Calculate cost
        Cost = -1/m*sum(numpy.power(self.a[-1] - targets, 2))
        for weight in self.weights:
            Cost = Cost + lamda/(2*m)*numpy.power(weight[1:, :], 2).sum()
        self.cost.append(abs(float(Cost)))

        #Calculate error for each layer
        delta = []
        delta.append(self.a[-1] - targets)
        for idx in range(1, self.layers-1): #No delta for the input layer because it is the input
            weight = self.weights[-idx][1:, :] #Ignore bias unit
            dsigmoid = numpy.multiply(self.a[-(idx+1)][:,1:], 1-self.a[-(idx+1)][:,1:]) #dsigmoid is a(l).*(1-a(l))
            delta.append(numpy.multiply(delta[-1]*weight.T, dsigmoid)) #Ignore Regularization

        Delta = []
        for idx in range(self.layers-1):
            Delta.append(self.a[idx].T*delta[-(idx+1)])

        self.weight_gradient = []
        for idx in range(len(Delta)):
            self.weight_gradient.append(numpy.nan_to_num(1/m*Delta[idx] + numpy.vstack((numpy.zeros((1, self.weights[idx].shape[1])), lamda/m*self.weights[idx][1:, :]))))

    def train(self, input, targets, alpha, lamda, iterations = 1000):

        #alpha: learning rate
        #lamda: regularization term

        for i in range(iterations):
            self.update(input)
            self.backPropagate(targets, lamda)
            self.weights = [self.weights[idx] - alpha*self.weight_gradient[idx] for idx in range(len(self.weights))]

    def predict(self, input):

        self.update(input)
        return self.a[-1]

但它不起作用=（。檢查成本與迭代我可以看到成本中的一個小點，A的預測都是一樣的。有人能幫助我理解為什么我的神經網絡沒有收斂嗎？

謝謝，抱歉代碼的數量（也許有人會發現它很有用）。

更新：

我沒有使用隨機數據，而是從UCI機器學習庫獲得了一些結構化數據。 特定數據集是葡萄牙東北地區森林火災的燒毀區域，使用氣象和其他數據： http ： //archive.ics.uci.edu/ml/datasets/Forest+Fires我修改了數據，以便天數和月數是數字： https ： //docs.google.com/spreadsheet/ccc？key = 0Am3oTptaLsExdC1PeXl1eTczRnRNejl3QUo5RjNLVVE

data = numpy.loadtxt(open('FF-data.csv', 'rb'), delimiter = ',', skiprows = 1)
features = data[:,0:11]
targets = numpy.matrix(data[:,12]).T

nfeatures = (features-features.mean(axis=0))/features.std(axis=0)

n = NN([11, 10, 1]) #The class takes the list of how many nodes in each layer
n.train(nfeatures, targets, 0.003, 0.0)

import matplotlib.pyplot
matplotlib.pyplot.subplot(221)
matplotlib.pyplot.plot(n.cost)
matplotlib.pyplot.title('Cost vs. Iteration')

matplotlib.pyplot.subplot(222)
matplotlib.pyplot.scatter(n.predict(nfeatures), targets)
matplotlib.pyplot.title('Data vs. Predicted')

matplotlib.pyplot.savefig('Report.png', format = 'png')
matplotlib.pyplot.close()

為什么成本低於4000左右，為什么Data Vs. 預測沒有任何趨勢？ 您可以在此處查看圖表： https ： //docs.google.com/open？id = 0B23oTptaLsExMTQ0OTAxNWEtYjE2NS00MjA5LTg1MjMtNDBhYjVmMTFhZDhm

Answer 1

（對不起，我沒有足夠的代表來添加評論，所以我會繼續發布答案。）

是的，看起來確實很奇怪。 但是，如果在訓練后生成新的矩陣B：

B = numpy.random.rand(5, 4)/5
Targets = B*X
print n.predict(B)
print B*X

它會正常工作（大部分時間 - 有時它仍會給出平均值（目標）作為答案）。 注意：我在使用100個功能時只使用了4個功能。

另外，我認為在數據集的50個元素上運行5000次迭代對你沒什么好處。 您通常應盡量使用盡可能多的訓練數據 - 在這里您可以根據需要使用，但您使用的示例甚至比您的功能更少。

這很有趣，我會考慮更多:)我正在使用你的網絡一個更簡單的例子 - 因為輸入我提供了兩個數字，並期望它們的總和作為輸出。 它的工作或多或少都沒問題。

Answer 2

由於一些原因，神經網絡無法訓練Forest Fire數據https://docs.google.com/spreadsheet/ccc?key=0Am3oTptaLsExdC1PeXl1eTczRnRNejl3QUo5RjNLVVE 。

首先，numpy.tanh（）sigmoid函數的行為不符合預期。 代碼應該改為：

self.a.append(numpy.hstack((numpy.ones((self.z[-1].shape[0], 1)),numpy.tanh(self.z[-1])))) #tanh is a fancy sigmoid

至：

self.a.append(numpy.hstack((numpy.ones((self.z[-1].shape[0], 1)), 1/(1+numpy.exp(-self.z[-1])))))

第二個numpy和matplotlib不是很好玩。 numpy矩陣似乎是向后繪制的。 這可以通過使用matrix.tolist（）來解決。 代碼改為：

matplotlib.pyplot.scatter(n.predict(nfeatures), targets)

至：

matplotlib.pyplot.scatter(n.predict(nfeatures).tolist(), targets.tolist())

最后，節點數應約為示例大小的10％。 而不是10，最好使用50個節點。

工作神經網絡代碼在下面發布了一個新功能autoparam，它試圖找到最佳學習率和正則化常數。 您可以在此處查看森林火災成本與迭代次數和數據與預測的圖表： https ： //docs.google.com/open？id = 0B23oTptaLsExMWQ4ZWM1ODYtZDMzMC00M2VkLWI1OWUtYzg3NzgxNWYyMTIy

謝謝閱讀！ 我希望我的神經網絡可以幫助人們。

import numpy

class NN:

    def __init__(self, sl):

        #sl = number of units (not counting bias unit) in layer l
        self.sl = sl
        self.layers = len(sl)

        #Create weights
        self.weights = []
        for idx in range(1, self.layers):
            self.weights.append(numpy.matrix(numpy.random.rand(self.sl[idx-1]+1, self.sl[idx]))/5)

        self.cost = []

    def update(self, input):

        if input.shape[1] != self.sl[0]:
            raise ValueError, 'The first layer must have a node for every feature'

        self.z = []
        self.a = []

        #Input activations.  Expected inputs as numpy matrix (Examples x Featrues) 
        self.a.append(numpy.hstack((numpy.ones((input.shape[0], 1)), input)))#Set inputs ai + bias unit

        #Hidden activations
        for weight in self.weights: 
            self.z.append(self.a[-1]*weight)
            self.a.append(numpy.hstack((numpy.ones((self.z[-1].shape[0], 1)), 1/(1+numpy.exp(-self.z[-1]))))) #sigmoid

        #Output activation
        self.a[-1] = self.z[-1] #Not logistic regression thus no sigmoid function
        del self.z[-1]

    def backPropagate(self, targets, lamda):

        m = float(targets.shape[0]) #m is number of examples

        #Calculate cost
        Cost = -1/m*sum(numpy.power(self.a[-1] - targets, 2))
        for weight in self.weights:
            Cost = Cost + lamda/(2*m)*numpy.power(weight[1:, :], 2).sum()
        self.cost.append(abs(float(Cost)))

        #Calculate error for each layer
        delta = []
        delta.append(self.a[-1] - targets)
        for idx in range(1, self.layers-1): #No delta for the input layer because it is the input
            weight = self.weights[-idx][1:, :] #Ignore bias unit
            dsigmoid = numpy.multiply(self.a[-(idx+1)][:,1:], 1-self.a[-(idx+1)][:,1:]) #dsigmoid is a(l).*(1-a(l))
            delta.append(numpy.multiply(delta[-1]*weight.T, dsigmoid)) #Ignore Regularization

        Delta = []
        for idx in range(self.layers-1):
            Delta.append(self.a[idx].T*delta[-(idx+1)])

        self.weight_gradient = []
        for idx in range(len(Delta)):
            self.weight_gradient.append(numpy.nan_to_num(1/m*Delta[idx] + numpy.vstack((numpy.zeros((1, self.weights[idx].shape[1])), lamda/m*self.weights[idx][1:, :]))))

    def train(self, input, targets, alpha, lamda, iterations = 1000):

        #alpha: learning rate
        #lamda: regularization term

        for i in range(iterations):
            self.update(input)
            self.backPropagate(targets, lamda)
            self.weights = [self.weights[idx] - alpha*self.weight_gradient[idx] for idx in range(len(self.weights))]

    def autoparam(self, data, alpha = [0.001, 0.003, 0.01, 0.03, 0.1, 0.3], lamda = [0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10]):

        #data: numpy matrix with targets in last column
        #alpha: learning rate
        #lamda: regularization term

        #Create training, cross validation, and test sets
        while 1:
            try:
                numpy.seterr(invalid = 'raise')
                numpy.random.shuffle(data) #Shuffle data
                training_set = data[0:data.shape[0]/10*6, 0:-1]
                self.ntraining_set = (training_set-training_set.mean(axis=0))/training_set.std(axis=0)
                self.training_tgt = numpy.matrix(data[0:data.shape[0]/10*6, -1]).T

                cv_set = data[data.shape[0]/10*6:data.shape[0]/10*8, 0:-1]
                self.ncv_set = (cv_set-cv_set.mean(axis=0))/cv_set.std(axis=0)
                self.cv_tgt = numpy.matrix(data[data.shape[0]/10*6:data.shape[0]/10*8, -1]).T

                test_set = data[data.shape[0]/10*8:, 0:-1]
                self.ntest_set = (test_set-test_set.mean(axis=0))/test_set.std(axis=0)
                self.test_tgt = numpy.matrix(data[data.shape[0]/10*8:, -1]).T

                break

            except FloatingPointError:
                pass

        numpy.seterr(invalid = 'warn')
        cost = 999999
        for i in alpha:
            for j in lamda:
                self.__init__(self.sl)
                self.train(self.ntraining_set, self.training_tgt, i, j, 2000)
                current_cost = 1/float(cv_set.shape[0])*sum(numpy.square(self.predict(self.ncv_set) - self.cv_tgt)).tolist()[0][0]
                print current_cost
                if current_cost < cost:
                    cost = current_cost
                    self.learning_rate = i
                    self.regularization = j
        self.__init__(self.sl)

    def predict(self, input):

        self.update(input)
        return self.a[-1]

正在加載數據，繪圖等...

data = numpy.loadtxt(open('FF-data.csv', 'rb'), delimiter = ',', skiprows = 1)#Load
numpy.random.shuffle(data)

features = data[:,0:11]
nfeatures = (features-features.mean(axis=0))/features.std(axis=0)
targets = numpy.matrix(data[:, 12]).T

n = NN([11, 50, 1])

n.train(nfeatures, targets, 0.07, 0.0, 2000)

import matplotlib.pyplot
matplotlib.pyplot.subplot(221)
matplotlib.pyplot.plot(n.cost)
matplotlib.pyplot.title('Cost vs. Iteration')

matplotlib.pyplot.subplot(222)
matplotlib.pyplot.scatter(n.predict(nfeatures).tolist(), targets.tolist())
matplotlib.pyplot.plot(targets.tolist(), targets.tolist(), c = 'r')
matplotlib.pyplot.title('Data vs. Predicted')

matplotlib.pyplot.savefig('Report.png', format = 'png')
matplotlib.pyplot.close()

Answer 3

我認為你的偏見應該從加權輸入中減去（或設置為-1）。 從我在你的代碼中看到的，神經元添加所有輸入，包括偏差（設置為+1）。

試圖用Python編寫我自己的神經網絡

問題描述

3 個解決方案

解決方案1
7 2012-01-27 09:54:34

解決方案2
4 已采納 2012-02-14 16:15:07

解決方案3
-1 2012-09-07 08:55:34

試圖用Python編寫我自己的神經網絡

問題描述

3 個解決方案

解決方案1 7 2012-01-27 09:54:34

解決方案2 4 已采納 2012-02-14 16:15:07

解決方案3 -1 2012-09-07 08:55:34

解決方案1
7 2012-01-27 09:54:34

解決方案2
4 已采納 2012-02-14 16:15:07

解決方案3
-1 2012-09-07 08:55:34