简体   繁体   English

从头开始使用正则化 model 训练逻辑回归

[英]Train a logistic regression with regularization model from scratch

I am trying to implement Logistic Regression model with regularisation.我正在尝试通过正则化实现逻辑回归 model。 I got stuck in computing the gradient because when I am running my gradient descent algorithm it actually shows that the cost function is increasing rather than decreasing.我一直在计算梯度,因为当我运行梯度下降算法时,它实际上表明成本 function 是在增加而不是减少。

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def Probability(theta, X):
    return sigmoid(np.dot(X,theta))


def  cost_function_regression(theta, x, y, Lambda):
    # Computes the cost function for all the training samples
    m = x.shape[0]
    total_cost = (-(1 / m) * np.sum(
    np.dot(y.T, np.log(Probability( theta,x))) + np.dot((1 - y).T, np.log(
            1 - Probability(theta,x))))) + (Lambda/ 2)* np.sum(np.dot(theta, theta.T))
    return total_cost

def Gradient_regression( theta, X,y, Lambda ):
    m=X.shape[0]

    grad=(((1/m)* np.dot(X.T, Probability(theta,X)-y)) + np.sum((Lambda/m )* theta))
    return(grad)

We will start by establishing the theory followed by the working example and end with some comments.我们将从建立理论开始,然后是工作示例,并以一些评论结束。

Problem statement问题陈述

The steps in fitting/training a logistic regression model (as with any supervised ML model) using gradient decent method are as below使用梯度下降法拟合/训练逻辑回归 model(与任何监督 ML 模型一样)的步骤如下

  1. Identify a hypothesis function [ h(X) ] with parameters [ w,b ]使用参数 [ w,b ] 识别假设 function [ h(X) ]
  2. Identify a loss function [ J(w,b) ]识别损失 function [ J(w,b) ]
  3. Forward propagation: Make predictions using the hypothesis functions [ y_hat = h(X) ]前向传播:使用假设函数 [ y_hat = h(X) ] 进行预测
  4. Calculate the error between the actual label [ y ] and the predicted label [ y_hat ] using the loss function.使用损失 function 计算实际 label [ y ] 和预测 label [ y_hat ] 之间的误差。
  5. Backward propagation: Adjust the parameters in the hypothesis function based on the error (by calculating the gradients), using the update rule反向传播:根据误差(通过计算梯度)调整假设 function 中的参数,使用更新规则

    在此处输入图像描述

  6. Got to step 3 if gradients are high else end如果梯度较高,则进入第 3 步,否则结束

Calculating gradients计算梯度

Hypothesis function for logistic regression:逻辑回归的假设 function:

在此处输入图像描述

Where X is a vector and X^i is the ith element of the vector.其中X是一个向量, X^i是向量的第 i 个元素。

The commonly used loss function for logistic regression is log loss.逻辑回归常用的损失function是对数损失。 The log loss with l2 regularization is: l2 正则化的对数损失为:

在此处输入图像描述

Lets calculate the gradients让我们计算梯度

在此处输入图像描述

Similarly相似地

在此处输入图像描述

Now that we know the gradients, lets code the gradient decent algorithm to fit the parameters of our logistic regression model现在我们知道了梯度,让我们编写梯度下降算法以适应我们的逻辑回归 model 的参数

Toy Example玩具示例

# load data
iris = datasets.load_iris()
# Lets take only two classes
y = iris.target
X = iris.data[y != 2] 
y = y[y != 2]

# Normalize data to 0 mean and 1 std
X[:, 0] = (X[:, 0] - np.mean(X[:, 0]))/np.std(X[:, 0])
X[:, 1] = (X[:, 1] - np.mean(X[:, 1]))/np.std(X[:, 1])
X[:, 2] = (X[:, 2] - np.mean(X[:, 2]))/np.std(X[:, 2])
X[:, 3] = (X[:, 3] - np.mean(X[:, 3]))/np.std(X[:, 3])

def sigmoid(x):
    return 1 / (1+math.exp(-x))  

# initialize weights
w0, w1, w2, w3, b = 0.01,0.01,0.01,0.01,0.01
n = len(X)
# Learning rate
alpha = 0.01
# The gardient decent loop
while True:
    y_hat = [sigmoid(w0*x[0] + w1*x[1] + w2*x[2] + w3*x[3] + b) for x in X]
    delta_w0 = -np.sum([(y[j] - y_hat[j])*X[j,0] for j in range(n)])/n + 2*w0
    delta_w1 = -np.sum([(y[j] - y_hat[j])*X[j,1] for j in range(n)])/n + 2*w1
    delta_w2 = -np.sum([(y[j] - y_hat[j])*X[j,2] for j in range(n)])/n + 2*w2
    delta_w3 = -np.sum([(y[j] - y_hat[j])*X[j,3] for j in range(n)])/n + 2*w3
    delta_b = -np.sum([(y[j] - y_hat[j]) for j in range(n)])/n + 2*b

    w0 = w0 - alpha*delta_w0
    w1 = w1 - alpha*delta_w1
    w2 = w2 - alpha*delta_w2
    w3 = w3 - alpha*delta_w3

    b = b - alpha*delta_b

    if np.sum(np.abs([delta_w0, delta_w1, delta_w2, delta_w3, delta_b])) < 1e-5:
        break

# Make predictions
pred = [1 if i > 0.5 else 0 for i in y_hat]
# Find no:of correct predictions
correct  = np.sum([1 if pred[i] == y[i] else 0 for i in range(n)])
print (correct)

Comments注释

  1. The above toy example is the coded in the most inefficient way.上面的玩具示例是以最低效的方式编码的。 The intention was to show the steps clearly rather then efficiency.目的是清楚地显示步骤而不是效率。 Saying that, we will have to vectorize the operations (using np arrays and matrics operations) for efficiency.话虽如此,我们将不得不向量化操作(使用 np arrays 和矩阵操作)以提高效率。
  2. Data normalization is important数据规范化很重要
  3. The models are trained on train data and the performance is measured on test/validation data.这些模型是在训练数据上训练的,性能是根据测试/验证数据来衡量的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM