簡體   English   中英

以“ tanh”為激活,而“交叉熵”為代價函數的神經網絡不起作用

[英]Neural network with 'tanh' as activation and 'cross-entropy' as cost function did not work

我已經實現了一個簡單的神經網絡。 它適用於“ S型+交叉熵”,“ S型+二次成本”和“ tanh +二次成本”,但不適用於“ tanh +交叉熵”(沒有比隨機猜測更好的了)。 有人可以幫我找出原因嗎? 只需查看FullConnectedLayer的代碼即可:

class FullConnectedLayer(BaseLayer):
    """
    FullConnectedLayer
    ~~~~~~~~~~~~~~~~~~~~
    Data members: 
    sizes       ---- <type list> sizes of the network
    n_layers    ---- <type int> number of sublayers
    activation  ---- <type Activation> activation function for neurons
    weights     ---- <type list> to store weights
    biases      ---- <type list> to store biases
    neurons     ---- <type list> to store states (outputs) of neurons
    zs          ---- <type list> to store weighted inputs to neurons
    grad_w      ---- <type list> to store gradient of Cost w.r.t weights
    grad_b      ---- <type list> to store gradient of Cost w.r.t biases
    ---------------------
    Methods:
    __init__(self, sizes, activation = Sigmoid())
    size(self)
    model(self)
    feedforward(self, a)
    backprop(self, C_p)
    update(self, eta, lmbda, batch_size, n)
    """

    def __init__(self, sizes, activation = Sigmoid(), normal_initialization = False):
        """
        The list ''sizes'' contains the number of neurons in repective layers
        of the network. For example, sizes = [2, 3, 2] represents 3 layers, with
        the first layer having 2 neurons, the second 3 neurons, and the third 2 
        neurons.

        Note that the input layer may be passed by other layer of another type 
        when connected after the layer, and we don't set biases for this layer.
        Also note that the output layer my be passed to other layer if connected
        before the layer, in this case, just assign the outputs to its inputs.
        For examle, Layer1([3, 2, 4])->Layer2([4, 6, 3])->Layer3([3, 2]). Just
        assign the output of Layer1 to the input Layer2, it will be safe.
        """

        BaseLayer.__init__(self, sizes, activation)

        if normal_initialization:
            self.weights = [np.random.randn(j, i)
                    for i, j in zip(sizes[:-1], sizes[1:])]
        else:
            self.weights = [np.random.randn(j, i) / np.sqrt(i)
                    for i, j in zip(sizes[:-1], sizes[1:])]
        self.biases = [np.random.randn(j, 1) for j in sizes[1:]]

        self.grad_w = [np.zeros(w.shape) for w in self.weights]
        self.grad_b = [np.zeros(b.shape) for b in self.biases]

    def feedforward(self, a):
        """
        Return output of the network if ''a'' is input.
        """
        self.neurons = [a] # to store activations (outputs) of all layers
        self.zs = []
        for w, b in zip(self.weights, self.biases):
            z = np.dot(w, self.neurons[-1]) + b
            self.zs.append(z)
            self.neurons.append(self.activation.func(z))
        return self.neurons[-1]


    def backprop(self, Cp_a):
        """
        Backpropagate the delta error.
        ------------------------------
        Return a tuple whose first component is a list of the gradients of 
        weights and biases, whose second component is the backpropagated delta.
        Cp_a, dC/da: derivative of cost function w.r.t a, output of neurons. 
        """
        # The last layer
        delta = Cp_a * self.activation.prime(self.zs[-1])
        self.grad_b[-1] += delta
        self.grad_w[-1] += np.dot(delta, self.neurons[-2].transpose()) 

        for l in range(2, self.n_layers):
            sp = self.activation.prime(self.zs[-l])  # a.prime(z)
            delta = np.dot(self.weights[-l + 1].transpose(), delta) * sp  
            self.grad_b[-l] += delta
            self.grad_w[-l] += np.dot(delta, self.neurons[-l - 1].transpose())

        Cp_a_out = np.dot(self.weights[0].transpose(), delta)

        return Cp_a_out

    def update(self, eta, lmbda, batch_size, n):
        """
        Update the network's weights and biases by applying gradient descent
        algorithm.
        ''eta'' is the learning rate
        ''lmbda'' is the regularization parameter
        ''n'' is the total size of the training data set
        """
        self.weights = [(1 - eta * (lmbda/n)) * w - (eta/batch_size) * delta_w\
                for w, delta_w in zip(self.weights, self.grad_w)]
        self.biases = [ b - (eta / batch_size) * delta_b\
                for b, delta_b in zip(self.biases, self.grad_b)]

        # Clear ''grad_w'' and ''grad_b'' so that they are not added to the 
        # next update pass
        for dw, db in zip(self.grad_w, self.grad_b):
            dw.fill(0)
            db.fill(0)

這是tanh函數的代碼:

class Tanh(Activation):

    @staticmethod
    def func(z):
        """ The functionality. """
        return (np.exp(z) - np.exp(-z)) / (np.exp(z) + np.exp(-z))

    @staticmethod
    def prime(z):
        """ The derivative. """
        return 1. - Tanh.func(z) ** 2

這是交叉熵類的代碼:

class CrossEntropyCost(Cost):

    @staticmethod
    def func(a, y):
        """
        Return the cost associated with an output ''a'' and desired output
        ''y''. 
        Note that np.nan_to_num is used to ensure numerical stability. In
        particular, if both ''a'' and ''y'' have a 1.0 in the same slot, 
        then the expression (1-y) * np.log(1-a) returns nan. The np.nan_to_num
        ensures that that is converted to the correct value(0.0).
        """
        for ai in a:
            if ai < 0:
                print("in CrossEntropyCost.func(a, y)... require a_i > 0, a_i belong to a.")
                exit(1)

        return np.sum(np.nan_to_num(-y * np.log(a) - (1-y) * np.log(1-a)))

    @staticmethod
    def Cp_a(a, y):
        """
        Cp_a, dC/da: the derivative of C w.r.t a
        ''a'' is the output of neurons
        ''y'' is the expected output of neurons
        """
        #return (a - y) # delta
        return (a - y) / (a * (1 - a))

編輯:似乎問題在於tanh的范圍是-1+1 ,這對於交叉熵是非法的。 但是,如果我只想要一個tanh激活和一個交叉熵成本,我應該如何處理呢?

答案來晚了,但我認為值得一提。

如果要使用tanh激活函數,而不是使用交叉熵代價函數,則可以對其進行修改以提供介於-1和1之間的輸出。

看起來像這樣:

((1 + y)/ 2 * log(a))+((1-y)/ 2 * log(1-a))

將其用作成本函數將使您可以使用tanh激活。

看起來您在輸出層中使用了tanh ,其中tanh的范圍是-1, +1 ,預期的輸出范圍是0, +1 對於Sigmoid ,這並不重要,它會產生0, +1范圍內的輸出。

交叉熵期望它的輸入為logit,范圍為0到1。Tanh方法將輸入轉換為交叉熵無法處理的-1到1范圍內的值。

一些可能的解決方法是在輸入為tanh和成本交叉熵的最后一層中重新縮放輸入。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM