簡體   English   中英

為什么 Dropout 會降低我的 model 精度?

[英]Why does Dropout deteriorates my model accuracy?

如果我在訓練中不使用 dropout,下面的代碼給出了大約 95% 的准確率。 如果我使用 dropout,准確率會下降到 11%。

該網絡使用 Numpy 構建。 我使用了包含許多層對象的 class 神經網絡。 最后一層有 sigmoid 激活,rest 有 Relu。 代碼是:

import numpy as np 
import idx2numpy as idx
import matplotlib.pyplot as plt

np.random.seed(0)
img = r"C:\Users\Aaditya\OneDrive\Documents\ML\train-image"
lbl = r'C:\Users\Aaditya\OneDrive\Documents\ML\train-labels-idx1-ubyte'
t_lbl = r'C:\Users\Aaditya\OneDrive\Documents\ML\t10k-labels.idx1-ubyte'
t_img = r'C:\Users\Aaditya\OneDrive\Documents\ML\t10k-images.idx3-ubyte'
image = idx.convert_from_file(img)
iput = np.reshape(image, (60000,784))/255
otput = np.eye(10)[idx.convert_from_file(lbl)]
test_image = idx.convert_from_file(t_img)
test_input = np.reshape(test_image, (10000,784))/255
test_output = idx.convert_from_file(t_lbl)

def sigmoid(x):
    sigmoid = 1/(1+ np.exp(-x)) 
    return sigmoid
    
def tanh(x):
    return np.tanh(x)
def relu(x):
    return np.where(x>0,x,0)

def reluprime(x):
    return (x>0).astype(x.dtype)

def sigmoid_prime(x):
    return sigmoid(x)*(1-sigmoid(x))
    
def tanh_prime(x):
    return 1 - tanh(x)**2
class Layer_Dense:
    def __init__(self,n_inputs,n_neurons,activation="sigmoid",keep_prob=1):
        self.n_neurons=n_neurons
        if activation == "sigmoid":
            self.activation = sigmoid
            self.a_prime = sigmoid_prime
        elif activation == "tanh":
            self.activation = tanh
            self.a_prime = tanh_prime
        else :
            self.activation = relu
            self.a_prime = reluprime
        self.keep_prob = keep_prob
        self.weights = np.random.randn(n_inputs ,n_neurons)*0.1
        self.biases = np.random.randn(1,n_neurons)*0.1 
    
    def cal_output(self,input,train=False):        
        output = np.array(np.dot(input,self.weights) + self.biases,dtype="float128")
        
        if train == True:
            D = np.random.randn(1,self.n_neurons)
            self.D = (D>self.keep_prob).astype(int)
            output = output * self.D  
        return output
    def forward(self,input):
        return self.activation(self.cal_output(input))
    def back_propagate(self,delta,ap,lr=1,keep_prob=1):
        dz =  delta
        self.weights -= 0.001*lr*(np.dot(ap.T,dz)*self.D)
        self.biases -= 0.001*lr*(np.sum(dz,axis=0,keepdims=True)*self.D)
        return np.multiply(np.dot(dz,self.weights.T),(1-ap**2))
        

class Neural_Network:
    def __init__(self,input,output):
        self.input=input
        self.output=output
        self.layers = []
    def Add_layer(self,n_neurons,activation="relu",keepprob=1):
        if len(self.layers) != 0:    
            newL = Layer_Dense(self.layers[-1].n_neurons,n_neurons,activation,keep_prob=keepprob)
        else:
            newL = Layer_Dense(self.input.shape[1],n_neurons,activation,keep_prob=keepprob)
        self.layers.append(newL)
    def predict(self,input):
        output = input
        for layer in self.layers:
            output = layer.forward(output)
        return output
    def cal_zs(self,input):
        self.activations = []
        self.activations.append(input)
        output = input
        for layer in self.layers:
            z = layer.cal_output(output,train=True)
            activation = layer.activation(z)
            self.activations.append(activation)
            output = activation
    def train(self,input=None,output=None,lr=10):
        if input is None:
            input=self.input
            output=self.output
            
        if len(input)>1000:
            indices = np.arange(input.shape[0])
            np.random.shuffle(indices)
            input = input[indices]
            output = output[indices]
            for _ in range(100):
                self.lr = lr
                for i in range(int(len(input)/100)):
                    self.lr *=0.99
                    self.train(input[i*100:i*100+100],output[i*100:i*100+100],self.lr)
            return
        self.cal_zs(input)
        for i in range(1,len(self.layers)+1):
            if i==1:
                delta = self.activations[-1] - output
                self.delta = self.layers[-1].back_propagate(delta,self.activations[-2],lr)
            else:
                self.delta = self.layers[-i].back_propagate(self.delta,self.activations[-i-1],lr)
    def MSE(self):
        predict = self.predict(self.input)
        error = (predict - self.output)**2
        mse = sum(sum(error))
        print(mse)
    def Logloss(self):
        predict = self.predict(self.input)
        error = np.multiply(self.output,np.log(predict)) + np.multiply(1-self.output,np.log(1-predict))
        logloss = -1*sum(sum(error))
        print(logloss)
    def accuracy(self):
        predict = self.predict(test_input)
        prediction = np.argmax(predict,axis=1)
        correct = np.mean(prediction == test_output)
        print(correct*100)
            
    # def train(self,input,output):
        
model = Neural_Network(iput,otput)
# model.Add_layer(4)
model.Add_layer(64)
model.Add_layer(16)
model.Add_layer(10,"sigmoid")
lrc= 6
for _ in range(10):
    model.accuracy()
    model.Logloss()
    model.train(lr=lrc)
model.accuracy()

我用過 MNIST 數據庫,鏈接是這個

原因之一可能是您可能丟棄了過多的神經元。 在下面的代碼中

D = np.random.randn(1,self.n_neurons)
self.D = (D>self.keep_prob).astype(int)

第一行生成的矩陣可能包含許多小於零的值。 因此,當將其與self.keep_prob (值為 1)進行比較時,很多神經元都被丟棄了

請嘗試一項更改

self.D = (D < self.keep_prob).astype(int)

這可能有多種原因。 一個由@anuragal 指定。

基本上 dropout 用於減少過度擬合並幫助網絡糾正錯誤。 但是當你在最后一層之前使用 dropout 時,可能是網絡無法自我糾正,從而導致精度降低

另一個原因可能是我看到您的網絡很小。 通常,淺層網絡不會從輟學中受益

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM