[英]Performing L1 regularization on a mini batch update
我目前正在閱讀神經網絡和深度學習 ,但遇到了問題。 問題是更新他給出的使用L1正則化而不是L2正則化的代碼。
使用L2正則化的原始代碼是:
def update_mini_batch(self, mini_batch, eta, lmbda, n):
"""Update the network's weights and biases by applying gradient
descent using backpropagation to a single mini batch. The
``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the
learning rate, ``lmbda`` is the regularization parameter, and
``n`` is the total size of the training data set.
"""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
可以看到使用L2正則化項更新了self.weights
。 對於L1正則化,我相信我只需要更新同一行即可反映
書中說我們可以估計
最小批量平均 這對我來說是一個令人困惑的聲明,但是我認為這意味着對於每個微型批處理,每一層都使用nabla_w
的平均值。 這使我對代碼進行了以下編輯:
def update_mini_batch(self, mini_batch, eta, lmbda, n):
"""Update the network's weights and biases by applying gradient
descent using backpropagation to a single mini batch. The
``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the
learning rate, ``lmbda`` is the regularization parameter, and
``n`` is the total size of the training data set.
"""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
avg_nw = [np.array([[np.average(layer)] * len(layer[0])] * len(layer))
for layer in nabla_w]
self.weights = [(1-eta*(lmbda/n))*w-(eta)*nw
for w, nw in zip(self.weights, avg_nw)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
但我得到的結果幾乎只是噪聲,准確度約為10%。 我將陳述解釋錯誤還是我的代碼錯誤? 任何提示將不勝感激。
那是不對的。
從概念上講, L2正則化是說我們將在每次訓練迭代后將W 幾何縮小一些。 這樣,如果W變得很大,它將進一步縮小。 這樣可以防止W中的各個值變得太大。
從概念上講, L1正則化是說我們將在每次訓練迭代后將W 線性降低一個常數(不越過零。正數減少到零但不低於零。負數增加到零但不高於零)。排除掉很小的W值,只留下做出重要貢獻的值。
你的第二個方程式
self.weights = [(1-eta*(lmbda/n))*w-(eta)*nw
for w, nw in zip(self.weights, avg_nw)]
不會執行原始減法,但仍具有(1-eta *(lmbda / n))* w的乘法(幾何縮放)。
實現一些函數reduceLinearlyToZero ,該函數接受w和eta *(lmbda / n)並返回max(abs(w-eta *(lmbda / n)),0)*(如果w> = 0則為1.0否則-1.0)
def reduceLinearlyToZero(w,eta,lmbda,n) :
return max( abs( w - eta*(lmbda/n) ) , 0 ) * ( 1.0 if w >= 0 else -1.0 )
self.weights = [ reduceLinearlyToZero(w,eta,lmbda,n)-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, avg_nw)]
或可能
self.weights = [ reduceLinearlyToZero(w-(eta/len(mini_batch))*nw,eta,lmbda,n)
for w, nw in zip(self.weights, avg_nw)]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.