堅持實施簡單的神經網絡

Question

我一直在用這個磚牆砸我的頭看似永恆，我似乎無法繞過它。 我正在嘗試僅使用numpy和矩陣乘法來實現自動編碼器。 沒有theano或keras技巧允許。

我將描述問題及其所有細節。 它起初有點復雜，因為有很多變量，但它確實非常簡單。

我們知道什么

1） X是m × n矩陣，它是我們的輸入。 輸入是該矩陣的行。 每個輸入都是一個n維行向量，我們有m個。

2）我們（單個）隱藏層中的神經元數量，即k 。

3）我們的神經元的激活功能（乙狀結腸，將表示為g(x) ）及其衍生物g'(x)

我們不知道什么，想找到什么

總的來說，我們的目標是找到6個矩陣： w1是n乘k ， b1是m乘k ， w2是k乘n ，b2是m乘n ， w3是n乘n和b3是m乘n 。

它們隨機初始化，我們找到使用梯度下降的最佳解決方案。

這個過程

整個過程看起來像這樣

首先我們計算z1 = Xw1+b1 。 它是m乘以k並且是隱藏層的輸入。 然后我們計算h1 = g(z1) ，它只是將sigmoid函數應用於z1所有元素。 自然它也是m乘以k並且是我們隱藏層的輸出。

然后我們計算z2 = h1w2+b2 ，它是m乘以n並且是我們神經網絡輸出層的輸入。 然后我們計算h2 = g(z2) ，它自然也是m乘以n並且是我們神經網絡的輸出。

最后，我們獲取此輸出並對其執行一些線性運算符： Xhat = h2w3+b3也是m乘以n並且是我們的最終結果。

我被卡住了

我想要最小化的成本函數是均方誤差。 我已經用numpy代碼實現了它

def cost(x, xhat):
    return (1.0/(2 * m)) * np.trace(np.dot(x-xhat,(x-xhat).T))

問題是找到關於w1,b1,w2,b2,w3,b3的成本的導數。 我們稱之為成本S

在得出自己並以數字方式檢查自己之后 ，我確定了以下事實：

1） dSdxhat = (1/m) * np.dot(xhat-x)

2） dSdw3 = np.dot(h2.T,dSdxhat)

3） dSdb3 = dSdxhat

4） dSdh2 = np.dot(dSdxhat, w3.T)

但我不能為我的生活弄清楚dSdz2。 這是一堵磚牆。

從鏈規則來看，應該是dSdz2 = dSdh2 * dh2dz2，但尺寸不匹配。

計算S相對於z2的導數的公式是什么？

編輯 - 這是我自動編碼器的整個前饋操作的代碼。

import numpy as np

def g(x): #sigmoid activation functions
    return 1/(1+np.exp(-x)) #same shape as x!

def gGradient(x): #gradient of sigmoid
    return g(x)*(1-g(x)) #same shape as x!

def cost(x, xhat): #mean squared error between x the data and xhat the output of the machine
    return (1.0/(2 * m)) * np.trace(np.dot(x-xhat,(x-xhat).T))

#Just small random numbers so we can test that it's working small scale
m = 5 #num of examples
n = 2 #num of features in each example
k = 2 #num of neurons in the hidden layer of the autoencoder
x = np.random.rand(m, n) #the data, shape (m, n)

w1 = np.random.rand(n, k) #weights from input layer to hidden layer, shape (n, k)
b1 = np.random.rand(m, k) #bias term from input layer to hidden layer (m, k)
z1 = np.dot(x,w1)+b1 #output of the input layer, shape (m, k)
h1 = g(z1) #input of hidden layer, shape (m, k)

w2 = np.random.rand(k, n) #weights from hidden layer to output layer of the autoencoder, shape (k, n)
b2 = np.random.rand(m, n) #bias term from hidden layer to output layer of autoencoder, shape (m, n)
z2 = np.dot(h1, w2)+b2 #output of the hidden layer, shape (m, n)
h2 = g(z2) #Output of the entire autoencoder. The output layer of the autoencoder. shape (m, n)

w3 = np.random.rand(n, n) #weights from output layer of autoencoder to entire output of the machine, shape (n, n)
b3 = np.random.rand(m, n) #bias term from output layer of autoencoder to entire output of the machine, shape (m, n)
xhat = np.dot(h2, w3)+b3 #the output of the machine, which hopefully resembles the original data x, shape (m, n)

Answer 1

好的，這是一個建議。 在向量的情況下，如果你有x作為長度為n的向量，那么g(x)也是長度為n的向量。 然而， g'(x)不是矢量，它是雅可比矩陣，並且其大小為n X n 。 類似地，在小批量情況下，其中X是大小為m X n的矩陣， g(X)是m X n但g'(X)是n X n 。 嘗試：

def gGradient(x): #gradient of sigmoid
    return np.dot(g(x).T, 1 - g(x))

@Paul是對的，偏見項應該是向量，而不是矩陣。 你應該有：

b1 = np.random.rand(k) #bias term from input layer to hidden layer (k,)
b2 = np.random.rand(n) #bias term from hidden layer to output layer of autoencoder, shape (n,)
b3 = np.random.rand(n) #bias term from output layer of autoencoder to entire output of the machine, shape (n,)

Numpy的廣播意味着您無需更改xhat的計算。

然后（我想！）你可以像這樣計算衍生物：

dSdxhat = (1/float(m)) * (xhat-x)
dSdw3 = np.dot(h2.T,dSdxhat)
dSdb3 = dSdxhat.mean(axis=0)
dSdh2 = np.dot(dSdxhat, w3.T)
dSdz2 = np.dot(dSdh2, gGradient(z2))
dSdb2 = dSdz2.mean(axis=0)
dSdw2 = np.dot(h1.T,dSdz2)
dSdh1 = np.dot(dSdz2, w2.T)
dSdz1 = np.dot(dSdh1, gGradient(z1))
dSdb1 = dSdz1.mean(axis=0)
dSdw1 = np.dot(x.T,dSdz1)

這對你有用嗎？

編輯

我已經決定，我完全不確定gGradient應該是一個矩陣。 怎么樣：

dSdxhat = (xhat-x) / m
dSdw3 = np.dot(h2.T,dSdxhat)
dSdb3 = dSdxhat.sum(axis=0)
dSdh2 = np.dot(dSdxhat, w3.T)
dSdz2 = h2 * (1-h2) * dSdh2
dSdb2 = dSdz2.sum(axis=0)
dSdw2 = np.dot(h1.T,dSdz2)
dSdh1 = np.dot(dSdz2, w2.T)
dSdz1 = h1 * (1-h1) * dSdh1
dSdb1 = dSdz1.sum(axis=0)
dSdw1 = np.dot(x.T,dSdz1)

堅持實施簡單的神經網絡

問題描述

1 個解決方案

解決方案1
4 已采納 2016-10-07 16:04:28

堅持實施簡單的神經網絡

問題描述

1 個解決方案

解決方案1 4 已采納 2016-10-07 16:04:28

解決方案1
4 已采納 2016-10-07 16:04:28