简体   繁体   中英

Train a logistic regression with regularization model from scratch

I am trying to implement Logistic Regression model with regularisation. I got stuck in computing the gradient because when I am running my gradient descent algorithm it actually shows that the cost function is increasing rather than decreasing.

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def Probability(theta, X):
    return sigmoid(np.dot(X,theta))


def  cost_function_regression(theta, x, y, Lambda):
    # Computes the cost function for all the training samples
    m = x.shape[0]
    total_cost = (-(1 / m) * np.sum(
    np.dot(y.T, np.log(Probability( theta,x))) + np.dot((1 - y).T, np.log(
            1 - Probability(theta,x))))) + (Lambda/ 2)* np.sum(np.dot(theta, theta.T))
    return total_cost

def Gradient_regression( theta, X,y, Lambda ):
    m=X.shape[0]

    grad=(((1/m)* np.dot(X.T, Probability(theta,X)-y)) + np.sum((Lambda/m )* theta))
    return(grad)

We will start by establishing the theory followed by the working example and end with some comments.

Problem statement

The steps in fitting/training a logistic regression model (as with any supervised ML model) using gradient decent method are as below

  1. Identify a hypothesis function [ h(X) ] with parameters [ w,b ]
  2. Identify a loss function [ J(w,b) ]
  3. Forward propagation: Make predictions using the hypothesis functions [ y_hat = h(X) ]
  4. Calculate the error between the actual label [ y ] and the predicted label [ y_hat ] using the loss function.
  5. Backward propagation: Adjust the parameters in the hypothesis function based on the error (by calculating the gradients), using the update rule

    在此处输入图像描述

  6. Got to step 3 if gradients are high else end

Calculating gradients

Hypothesis function for logistic regression:

在此处输入图像描述

Where X is a vector and X^i is the ith element of the vector.

The commonly used loss function for logistic regression is log loss. The log loss with l2 regularization is:

在此处输入图像描述

Lets calculate the gradients

在此处输入图像描述

Similarly

在此处输入图像描述

Now that we know the gradients, lets code the gradient decent algorithm to fit the parameters of our logistic regression model

Toy Example

# load data
iris = datasets.load_iris()
# Lets take only two classes
y = iris.target
X = iris.data[y != 2] 
y = y[y != 2]

# Normalize data to 0 mean and 1 std
X[:, 0] = (X[:, 0] - np.mean(X[:, 0]))/np.std(X[:, 0])
X[:, 1] = (X[:, 1] - np.mean(X[:, 1]))/np.std(X[:, 1])
X[:, 2] = (X[:, 2] - np.mean(X[:, 2]))/np.std(X[:, 2])
X[:, 3] = (X[:, 3] - np.mean(X[:, 3]))/np.std(X[:, 3])

def sigmoid(x):
    return 1 / (1+math.exp(-x))  

# initialize weights
w0, w1, w2, w3, b = 0.01,0.01,0.01,0.01,0.01
n = len(X)
# Learning rate
alpha = 0.01
# The gardient decent loop
while True:
    y_hat = [sigmoid(w0*x[0] + w1*x[1] + w2*x[2] + w3*x[3] + b) for x in X]
    delta_w0 = -np.sum([(y[j] - y_hat[j])*X[j,0] for j in range(n)])/n + 2*w0
    delta_w1 = -np.sum([(y[j] - y_hat[j])*X[j,1] for j in range(n)])/n + 2*w1
    delta_w2 = -np.sum([(y[j] - y_hat[j])*X[j,2] for j in range(n)])/n + 2*w2
    delta_w3 = -np.sum([(y[j] - y_hat[j])*X[j,3] for j in range(n)])/n + 2*w3
    delta_b = -np.sum([(y[j] - y_hat[j]) for j in range(n)])/n + 2*b

    w0 = w0 - alpha*delta_w0
    w1 = w1 - alpha*delta_w1
    w2 = w2 - alpha*delta_w2
    w3 = w3 - alpha*delta_w3

    b = b - alpha*delta_b

    if np.sum(np.abs([delta_w0, delta_w1, delta_w2, delta_w3, delta_b])) < 1e-5:
        break

# Make predictions
pred = [1 if i > 0.5 else 0 for i in y_hat]
# Find no:of correct predictions
correct  = np.sum([1 if pred[i] == y[i] else 0 for i in range(n)])
print (correct)

Comments

  1. The above toy example is the coded in the most inefficient way. The intention was to show the steps clearly rather then efficiency. Saying that, we will have to vectorize the operations (using np arrays and matrics operations) for efficiency.
  2. Data normalization is important
  3. The models are trained on train data and the performance is measured on test/validation data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM