對小型批處理更新執行L1正則化

Question

我目前正在閱讀神經網絡和深度學習，但遇到了問題。 問題是更新他給出的使用L1正則化而不是L2正則化的代碼。

使用L2正則化的原始代碼是：

def update_mini_batch(self, mini_batch, eta, lmbda, n):
    """Update the network's weights and biases by applying gradient
    descent using backpropagation to a single mini batch.  The
    ``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the
    learning rate, ``lmbda`` is the regularization parameter, and
    ``n`` is the total size of the training data set.

    """
    nabla_b = [np.zeros(b.shape) for b in self.biases]
    nabla_w = [np.zeros(w.shape) for w in self.weights]
    for x, y in mini_batch:
        delta_nabla_b, delta_nabla_w = self.backprop(x, y)
        nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
        nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
    self.weights = [(1-eta*(lmbda/n))*w-(eta/len(mini_batch))*nw
                    for w, nw in zip(self.weights, nabla_w)]
    self.biases = [b-(eta/len(mini_batch))*nb
                   for b, nb in zip(self.biases, nabla_b)]

可以看到使用L2正則化項更新了self.weights 。 對於L1正則化，我相信我只需要更新同一行即可反映

書中說我們可以估計

最小批量平均 這對我來說是一個令人困惑的聲明，但是我認為這意味着對於每個微型批處理，每一層都使用nabla_w的平均值。 這使我對代碼進行了以下編輯：

def update_mini_batch(self, mini_batch, eta, lmbda, n):
    """Update the network's weights and biases by applying gradient
    descent using backpropagation to a single mini batch.  The
    ``mini_batch`` is a list of tuples ``(x, y)``, ``eta`` is the
    learning rate, ``lmbda`` is the regularization parameter, and
    ``n`` is the total size of the training data set.

    """
    nabla_b = [np.zeros(b.shape) for b in self.biases]
    nabla_w = [np.zeros(w.shape) for w in self.weights]
    for x, y in mini_batch:
        delta_nabla_b, delta_nabla_w = self.backprop(x, y)
        nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
        nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
    avg_nw = [np.array([[np.average(layer)] * len(layer[0])] * len(layer))
              for layer in nabla_w]
    self.weights = [(1-eta*(lmbda/n))*w-(eta)*nw
                    for w, nw in zip(self.weights, avg_nw)]
    self.biases = [b-(eta/len(mini_batch))*nb
                   for b, nb in zip(self.biases, nabla_b)]

但我得到的結果幾乎只是噪聲，准確度約為10％。 我將陳述解釋錯誤還是我的代碼錯誤？ 任何提示將不勝感激。

Answer 1

那是不對的。

從概念上講， L2正則化是說我們將在每次訓練迭代后將W 幾何縮小一些。 這樣，如果W變得很大，它將進一步縮小。 這樣可以防止W中的各個值變得太大。

從概念上講， L1正則化是說我們將在每次訓練迭代后將W 線性降低一個常數（不越過零。正數減少到零但不低於零。負數增加到零但不高於零）。排除掉很小的W值，只留下做出重要貢獻的值。

你的第二個方程式

self.weights = [(1-eta*(lmbda/n))*w-(eta)*nw
                for w, nw in zip(self.weights, avg_nw)]

不會執行原始減法，但仍具有（1-eta *（lmbda / n））* w的乘法（幾何縮放）。

實現一些函數reduceLinearlyToZero ，該函數接受w和eta *（lmbda / n）並返回max（abs（w-eta *（lmbda / n）），0）*（如果w> = 0則為1.0否則-1.0）

def reduceLinearlyToZero(w,eta,lmbda,n) :
    return max( abs( w - eta*(lmbda/n) ) , 0 ) * ( 1.0 if w >= 0 else -1.0 )


self.weights = [ reduceLinearlyToZero(w,eta,lmbda,n)-(eta/len(mini_batch))*nw
                for w, nw in zip(self.weights, avg_nw)]

或可能

self.weights = [ reduceLinearlyToZero(w-(eta/len(mini_batch))*nw,eta,lmbda,n)
                for w, nw in zip(self.weights, avg_nw)]

對小型批處理更新執行L1正則化

問題描述

1 個解決方案

解決方案1
1 已采納 2017-06-20 02:32:57

對小型批處理更新執行L1正則化

問題描述

1 個解決方案

解決方案1 1 已采納 2017-06-20 02:32:57

解決方案1
1 已采納 2017-06-20 02:32:57