简体   繁体   English

神经网络收敛到零输出

[英]Neural Network converging to zero output

I am trying to train this neural network to make predictions on some data. 我正在尝试训练该神经网络对一些数据进行预测。 I tried it on a small dataset (around 100 records) and it was working like a charm. 我在一个小的数据集(大约100条记录)上进行了尝试,它的工作原理像一个魅力。 Then I plugged the new dataset and I found out the NN converges to 0 output and the error converges approximately to the ratio between the number of positive examples and the total number of examples. 然后,我插入了新的数据集,发现NN收敛到0输出,并且误差近似收敛到正样本数与样本总数之间的比率。

My dataset is composed by yes/no features (1.0/0.0) and the ground truth is yes/no as well. 我的数据集由是/否特征(1.0 / 0.0)组成,基本事实也是是/否。

My suppositions: 我的假设:
1) there's a local minimum with output 0 (but I tried with many values of the learning rate and init weights, it seems to converge always there) 1)有一个局部最小值,输出为0(但是我尝试了很多学习率和初始权重的值,似乎总是在那儿收敛)
2) my weight update is wrong (but looks good to me) 2)我的体重更新不正确(但对我来说不错)
3) it is just an output scaling problem. 3)这只是一个输出缩放问题。 I tried to scale the output (ie output/max(output) and output/mean(output)) but the results are not good as you can see in the code provided below. 我试图缩放输出(即output / max(output)和output / mean(output)),但结果并不理想,如您在下面提供的代码中所见。 Should I scale it in a different way? 我应该以其他方式缩放它吗? Softmax? SOFTMAX?

here is the code: 这是代码:

import pandas as pd
import numpy as np
import pickle
import random
from collections import defaultdict

alpha = 0.1
N_LAYERS = 10
N_ITER = 10
#N_FEATURES = 8
INIT_SCALE = 1.0

train = pd.read_csv("./data/prediction.csv")

y = train['y_true'].as_matrix()
y = np.vstack(y).astype(float)
ytest = y[18000:]
y = y[:18000]

X = train.drop(['y_true'], axis = 1).as_matrix()
Xtest = X[18000:].astype(float)
X = X[:18000]

def tanh(x,deriv=False):
    if(deriv==True):
        return (1 - np.tanh(x)**2) * alpha
    else:
        return np.tanh(x)

def sigmoid(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    else:
        return 1/(1+np.exp(-x))

def relu(x,deriv=False):
    if(deriv==True):
        return 0.01 + 0.99*(x>0)
    else:
        return 0.01*x + 0.99*x*(x>0)

np.random.seed()

syn = defaultdict(np.array)

for i in range(N_LAYERS-1):
    syn[i] = INIT_SCALE * np.random.random((len(X[0]),len(X[0]))) - INIT_SCALE/2
syn[N_LAYERS-1] = INIT_SCALE * np.random.random((len(X[0]),1)) - INIT_SCALE/2

l = defaultdict(np.array)
delta = defaultdict(np.array)

for j in xrange(N_ITER):
    l[0] = X
    for i in range(1,N_LAYERS+1):
        l[i] = relu(np.dot(l[i-1],syn[i-1]))

    error = (y - l[N_LAYERS])

    e = np.mean(np.abs(error))
    if (j% 1) == 0:
        print "\nIteration " + str(j) + " of " + str(N_ITER)
        print "Error: " + str(e)

    delta[N_LAYERS] = error*relu(l[N_LAYERS],deriv=True) * alpha
    for i in range(N_LAYERS-1,0,-1):
        error = delta[i+1].dot(syn[i].T)
        delta[i] = error*relu(l[i],deriv=True) * alpha

    for i in range(N_LAYERS):
        syn[i] += l[i].T.dot(delta[i+1])



pickle.dump(syn, open('neural_weights.pkl', 'wb'))

# TESTING with f1-measure
# RECALL = TRUE POSITIVES / ( TRUE POSITIVES + FALSE NEGATIVES)
# PRECISION = TRUE POSITIVES / (TRUE POSITIVES + FALSE POSITIVES)

l[0] = Xtest
for i in range(1,N_LAYERS+1):
    l[i] = relu(np.dot(l[i-1],syn[i-1]))

out = l[N_LAYERS]/max(l[N_LAYERS])

tp = float(0)
fp = float(0)
fn = float(0)
tn = float(0)

for i in l[N_LAYERS][:50]:
    print i

for i in range(len(ytest)):
    if out[i] > 0.5 and ytest[i] == 1:
        tp += 1
    if out[i] <= 0.5 and ytest[i] == 1:
        fn += 1
    if out[i] > 0.5 and ytest[i] == 0:
        fp += 1
    if out[i] <= 0.5 and ytest[i] == 0:
        tn += 1

print "tp: " + str(tp)
print "fp: " + str(fp)
print "tn: " + str(tn)
print "fn: " + str(fn)

print "\nprecision: " + str(tp/(tp + fp))
print "recall: " + str(tp/(tp + fn))

f1 = 2 * tp /(2 * tp + fn + fp)
print "\nf1-measure:" + str(f1)

and this is the output: 这是输出:

Iteration 0 of 10
Error: 0.222500767998

Iteration 1 of 10
Error: 0.222500771157

Iteration 2 of 10
Error: 0.222500774321

Iteration 3 of 10
Error: 0.22250077749

Iteration 4 of 10
Error: 0.222500780663

Iteration 5 of 10
Error: 0.222500783841

Iteration 6 of 10
Error: 0.222500787024

Iteration 7 of 10
Error: 0.222500790212

Iteration 8 of 10
Error: 0.222500793405

Iteration 9 of 10
Error: 0.222500796602


[ 0.]
[ 0.]
[  5.58610895e-06]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[  4.62182626e-06]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[  5.58610895e-06]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[  4.62182626e-06]
[ 0.]
[ 0.]
[  5.04501079e-10]
[  5.58610895e-06]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[ 0.]
[  5.04501079e-10]
[ 0.]
[ 0.]
[  4.62182626e-06]
[ 0.]
[  5.58610895e-06]
[ 0.]
[ 0.]
[ 0.]
[  5.58610895e-06]
[ 0.]
[ 0.]
[ 0.]
[  5.58610895e-06]
[ 0.]
[  1.31432294e-05]

tp: 28.0
fp: 119.0
tn: 5537.0
fn: 1550.0

precision: 0.190476190476
recall: 0.0177439797212

f1-measure:0.0324637681159

Based upon your model its unlikely you would need 10 layers for your network to converge. 根据您的模型,它不太可能需要10层网络来融合。

Try a 3 layer network with more hidden nodes. 尝试使用具有更多隐藏节点的3层网络。 For a majority of Feedforward problems you will only need 1 hidden layer to effectively converge. 对于大多数前馈问题,您只需要1个隐藏层即可有效收敛。

Deep NN's are much more difficult to train then shallow ones. 深度神经网络的训练要比浅层神经网络难得多。

Like others have said you learning rate should be much smaller [.01,.3] is a decent range, additionally the number of iterations needs to be much greater. 就像其他人所说的那样,您的学习率应该小得多[.01,.3]是一个不错的范围,此外,迭代次数也需要大得多。

10 Layers is way too many. 10层太多了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM