Relu比乙狀結腸表現差嗎？

Question

我在所有層和輸出上都使用了Sigmoid，最終錯誤率是0.00012，但是當我使用理論上更好的 Relu時，我得到的結果可能最差。 誰能解釋為什么會這樣？ 我正在使用100個網站上可用的非常簡單的2層實施代碼，但仍在下面給出，

import numpy as np
#test
#avg(nonlin(np.dot(nonlin(np.dot([0,0,1],syn0)),syn1)))
#returns list >> [predicted_output, confidence]
def nonlin(x,deriv=False):#Sigmoid
    if(deriv==True):
        return x*(1-x)

    return 1/(1+np.exp(-x))

def relu(x, deriv=False):#RELU
    if (deriv == True):
        for i in range(0, len(x)):
            for k in range(len(x[i])):
                if x[i][k] > 0:
                    x[i][k] = 1
                else:
                    x[i][k] = 0
        return x
    for i in range(0, len(x)):
        for k in range(0, len(x[i])):
            if x[i][k] > 0:
                pass  # do nothing since it would be effectively replacing x with x
            else:
                x[i][k] = 0
    return x

X = np.array([[0,0,1],
            [0,0,0],  
            [0,1,1],
            [1,0,1],
            [1,0,0],
            [0,1,0]])

y = np.array([[0],[1],[0],[0],[1],[1]])

np.random.seed(1)

# randomly initialize our weights with mean 0
syn0 = 2*np.random.random((3,4)) - 1
syn1 = 2*np.random.random((4,1)) - 1

def avg(i):
        if i > 0.5:
            confidence = i
            return [1,float(confidence)]
        else:
            confidence=1.0-float(i)
            return [0,confidence]
for j in xrange(500000):

    # Feed forward through layers 0, 1, and 2
    l0 = X
    l1 = nonlin(np.dot(l0,syn0Performing))
    l2 = nonlin(np.dot(l1,syn1))
    #print 'this is',l2,'\n'
    # how much did we miss the target value?
    l2_error = y - l2
    #print l2_error,'\n'
    if (j% 100000) == 0:
        print "Error:" + str(np.mean(np.abs(l2_error)))
        print syn1

    # in what direction is the target value?
    # were we really sure? if so, don't change too much.
    l2_delta = l2_error*nonlin(l2,deriv=True)

    # how much did each l1 value contribute to the l2 error (according to the weights)?
    l1_error = l2_delta.dot(syn1.T)

    # in what direction is the target l1?
    # were we really sure? if so, don't change too much.
    l1_delta = l1_error * nonlin(l1,deriv=True)

    syn1 += l1.T.dot(l2_delta)
    syn0 += l0.T.dot(l1_delta)
print "Final Error:" + str(np.mean(np.abs(l2_error)))
def p(l):
        return avg(nonlin(np.dot(nonlin(np.dot(l,syn0)),syn1)))

所以p（x）是traning后的預測函數，其中x是輸入值的1 x 3矩陣。

Answer 1

為什么說理論上更好？ 在大多數應用中，ReLU已被證明是更好的，但這並不意味着它在總體上會更好。 您的示例非常簡單，輸入的比例在[0,1]之間，與輸出相同。 這正是我希望S型曲線表現良好的地方。 由於逐漸消失的梯度問題以及大型網絡中的其他一些問題，在實踐中您不會在隱藏層中遇到S型曲線，但這對您來說並不是一個問題。

同樣，如果有機會使用ReLU派生詞， 則代碼中會丟失“ else” 。 您的導數將被簡單地覆蓋。

就像刷新一樣，這是ReLU的定義：

f（x）=最大值（0，x）

...意味着它可以將您的激活作用無限化 。 您要避免在最后一個（輸出）層使用ReLU。

附帶說明一下，只要有可能，就應該利用向量化操作：

def relu(x, deriv=False):#RELU
    if (deriv == True):
        mask = x > 0
        x[mask] = 1
        x[~mask] = 0
    else: # HERE YOU WERE MISSING "ELSE"
        return np.maximum(0,x)

是的，這要比 /否則要快得多。

Relu比乙狀結腸表現差嗎？

問題描述

1 個解決方案

解決方案1
1 已采納 2017-06-04 10:28:43

Relu比乙狀結腸表現差嗎？

問題描述

1 個解決方案

解決方案1 1 已采納 2017-06-04 10:28:43

解決方案1
1 已采納 2017-06-04 10:28:43