简体   繁体   English

具有 Logloss 和 L2 正则化的 SGD 分类器 使用 SGD 而不使用 sklearn python

[英]SGD Classifier with Logloss and L2 regularization Using SGD without using sklearn python

I'm working on an assignment problem on SGD manual implementation using python.我正在使用 python 解决 SGD 手动实现的分配问题。 I'm stuck at the dw derivative function.我被困在 dw 导数 function 上。

import numpy as np 
import pandas as pd 
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=50000, n_features=15, n_informative=10, n_redundant
=5,n_classes=2, weights=[0.7], class_sep=0.7, random_state=15)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=15)

def initialize_weights(dim):
    w=np.zeros_like(dim)
    b=0
    return w,b
dim=X_train[0] 
w,b = initialize_weights(dim)
print('w =',(w))
print('b =',str(b))

import math
def sigmoid(z):
''' In this function, we will return sigmoid of z'''
# compute sigmoid(z) and return
    test_neg_int = -z
    sig_z=1/(1+(math.exp(test_neg_int )))

    return sig_z

import math
def logloss(y_true,y_pred):
'''In this function, we will compute log loss '''
    n=len(y_true)
    loss= -(1.0/n)*sum([y_true[i]*math.log(y_pred[i],10)+ (1.0-y_true[i])*math.log(1.0-y_pred[i],10) 
    for i in range(len(y_true))])
    return loss

def gradient_dw(x,y,w,b,alpha,N):
'''In this function, we will compute the gardient w.r.to w '''
    for n in range(0,len(x)):
        dw=[] 
 # y=0, x= 15 array values, w= 15 array values of 0, b=0, alpha=0.0001, n=len(X_train)=37500
        lambda_val = 0.01
        d = x[n]*((y-alpha*((w.T)*x[n]+b)) - ((lambda_val*w)/N))
        dw.append(d)
    print (dw)

def grader_dw(x,y,w,b,alpha,N):
    grad_dw=gradient_dw(x,y,w,b,alpha,N)
    assert(np.sum(grad_dw)==2.613689585)
    return True
grad_x=np.array([-2.07864835,  3.31604252, -0.79104357, -3.87045546, -1.14783286,
   -2.81434437, -0.86771071, -0.04073287,  0.84827878,  1.99451725,
    3.67152472,  0.01451875,  2.01062888,  0.07373904, -5.54586092])
grad_y=0
grad_w,grad_b=initialize_weights(grad_x)
alpha=0.0001
N=len(X_train)
grader_dw(grad_x,grad_y,grad_w,grad_b,alpha,N)

Result i'm getting结果我得到

[array([-0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0.,
     -0., -0.])]
  ---------------------------------------------------------------------------
 AssertionError                            Traceback (most recent call last)
<ipython-input-168-a3ed60706dc2> in <module>
     10 alpha=0.0001
     11 N=len(X_train)
---> 12 grader_dw(grad_x,grad_y,grad_w,grad_b,alpha,N)

<ipython-input-168-a3ed60706dc2> in grader_dw(x, y, w, b, alpha, N)
      1 def grader_dw(x,y,w,b,alpha,N):
      2     grad_dw=gradient_dw(x,y,w,b,alpha,N)
----> 3     assert(np.sum(grad_dw)==2.613689585)
      4     return True
      5 grad_x=np.array([-2.07864835,  3.31604252, -0.79104357, -3.87045546, -1.14783286,

AssertionError: 

Expected result:预期结果:

True

Could you please tell me if my understanding of the gradient_dw function is wrong?你能告诉我我对gradient_dw function的理解是否错误吗? I'm trying to apply this formula:我正在尝试应用这个公式:

dw(t) = xn * (yn − σ * (((w(t))Transpose) * xn + b(t))) − (λ * w(t)) / N)

I'm trying to Compute gradient w.r.t 'w' in the gradient_dw function so as to use it later in the main code.我正在尝试在 gradient_dw function 中计算梯度 w.r.t 'w' 以便稍后在主代码中使用它。 What I'm not understanding is that w is an array of 0s and y=0, so when we apply the dw(t) formula and return dw, we will most likely get an array of 0s, but why does it say " assert(np.sum(grad_dw)==2.613689585)".我不明白的是 w 是一个 0s 和 y=0 的数组,所以当我们应用 dw(t) 公式并返回 dw 时,我们很可能会得到一个 0s 的数组,但是为什么它说“断言(np.sum(grad_dw)==2.613689585)”。 how could we possibly get 2.613689585?我们怎么可能得到 2.613689585?

Try this:尝试这个:

try:
   assert()
except AssertionError:
   return True

You are approaching wrong here你在这里接近错误

  1. While iterating we iterate through 'n' points(as batch size is 1) in stochastic gradient descent, rather than 'd' dimensions.在迭代时,我们在随机梯度下降中迭代“n”个点(因为批量大小为 1),而不是“d”维。 Here you are iterating through 'd' dimensions.在这里,您正在迭代“d”维度。

  2. grad_x=np.array([-2.07864835, 3.31604252, -0.79104357, -3.87045546, -1.14783286, -2.81434437, -0.86771071, -0.04073287, 0.84827878, 1.99451725, 3.67152472, 0.01451875, 2.01062888, 0.07373904, -5.54586092]) grad_x=np.array([-2.07864835, 3.31604252, -0.79104357, -3.87045546, -1.14783286, -2.81434437, -0.86771071, -0.04073287, 0.84827878, 1.99451725, 3.67152472, 0.01451875, 2.01062888, 0.07373904, -5.54586092])

It is a single point with 15 dimensions.它是一个有 15 个维度的点。 So modify your query like below.因此,如下所示修改您的查询。 It would work.它会起作用的。

    def gradient_dw(x,y,w,b,alpha,N):
       '''In this function, we will compute the gardient w.r.to w '''
       dw=x * (y-sigmoid(np.dot(w.T,x)+b)) -(alpha * w)/N

       return dw
def gradient_dw(x,y,w,b,alpha,N):

   dw=(x*(y-sigmoid((w.T)*x+b)-(alpha/N)*w))
   return dw 

This is the solution:这是解决方案:

def gradient_dw(x,y,w,b,alpha,N):

    dw =x*(y-sigmoid(np.dot(w,x+b))) - ((alpha*w)/N)
    return dw

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM