Train a logistic regression with regularization model from scratch

Question

I am trying to implement Logistic Regression model with regularisation. I got stuck in computing the gradient because when I am running my gradient descent algorithm it actually shows that the cost function is increasing rather than decreasing.

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def Probability(theta, X):
    return sigmoid(np.dot(X,theta))


def  cost_function_regression(theta, x, y, Lambda):
    # Computes the cost function for all the training samples
    m = x.shape[0]
    total_cost = (-(1 / m) * np.sum(
    np.dot(y.T, np.log(Probability( theta,x))) + np.dot((1 - y).T, np.log(
            1 - Probability(theta,x))))) + (Lambda/ 2)* np.sum(np.dot(theta, theta.T))
    return total_cost

def Gradient_regression( theta, X,y, Lambda ):
    m=X.shape[0]

    grad=(((1/m)* np.dot(X.T, Probability(theta,X)-y)) + np.sum((Lambda/m )* theta))
    return(grad)

Answer 1

We will start by establishing the theory followed by the working example and end with some comments.

Problem statement

The steps in fitting/training a logistic regression model (as with any supervised ML model) using gradient decent method are as below

Identify a hypothesis function [ h(X) ] with parameters [ w,b ]
Identify a loss function [ J(w,b) ]
Forward propagation: Make predictions using the hypothesis functions [ y_hat = h(X) ]
Calculate the error between the actual label [ y ] and the predicted label [ y_hat ] using the loss function.
Backward propagation: Adjust the parameters in the hypothesis function based on the error (by calculating the gradients), using the update rule
Got to step 3 if gradients are high else end

Calculating gradients

Hypothesis function for logistic regression:

Where X is a vector and X^i is the ith element of the vector.

The commonly used loss function for logistic regression is log loss. The log loss with l2 regularization is:

Lets calculate the gradients

Similarly

Now that we know the gradients, lets code the gradient decent algorithm to fit the parameters of our logistic regression model

Toy Example

# load data
iris = datasets.load_iris()
# Lets take only two classes
y = iris.target
X = iris.data[y != 2] 
y = y[y != 2]

# Normalize data to 0 mean and 1 std
X[:, 0] = (X[:, 0] - np.mean(X[:, 0]))/np.std(X[:, 0])
X[:, 1] = (X[:, 1] - np.mean(X[:, 1]))/np.std(X[:, 1])
X[:, 2] = (X[:, 2] - np.mean(X[:, 2]))/np.std(X[:, 2])
X[:, 3] = (X[:, 3] - np.mean(X[:, 3]))/np.std(X[:, 3])

def sigmoid(x):
    return 1 / (1+math.exp(-x))  

# initialize weights
w0, w1, w2, w3, b = 0.01,0.01,0.01,0.01,0.01
n = len(X)
# Learning rate
alpha = 0.01
# The gardient decent loop
while True:
    y_hat = [sigmoid(w0*x[0] + w1*x[1] + w2*x[2] + w3*x[3] + b) for x in X]
    delta_w0 = -np.sum([(y[j] - y_hat[j])*X[j,0] for j in range(n)])/n + 2*w0
    delta_w1 = -np.sum([(y[j] - y_hat[j])*X[j,1] for j in range(n)])/n + 2*w1
    delta_w2 = -np.sum([(y[j] - y_hat[j])*X[j,2] for j in range(n)])/n + 2*w2
    delta_w3 = -np.sum([(y[j] - y_hat[j])*X[j,3] for j in range(n)])/n + 2*w3
    delta_b = -np.sum([(y[j] - y_hat[j]) for j in range(n)])/n + 2*b

    w0 = w0 - alpha*delta_w0
    w1 = w1 - alpha*delta_w1
    w2 = w2 - alpha*delta_w2
    w3 = w3 - alpha*delta_w3

    b = b - alpha*delta_b

    if np.sum(np.abs([delta_w0, delta_w1, delta_w2, delta_w3, delta_b])) < 1e-5:
        break

# Make predictions
pred = [1 if i > 0.5 else 0 for i in y_hat]
# Find no:of correct predictions
correct  = np.sum([1 if pred[i] == y[i] else 0 for i in range(n)])
print (correct)

Comments

The above toy example is the coded in the most inefficient way. The intention was to show the steps clearly rather then efficiency. Saying that, we will have to vectorize the operations (using np arrays and matrics operations) for efficiency.
Data normalization is important
The models are trained on train data and the performance is measured on test/validation data.

Train a logistic regression with regularization model from scratch

Question

1 answers

solution1
0 2020-04-12 12:32:01

Problem statement

Calculating gradients

Toy Example

Comments

Train a logistic regression with regularization model from scratch

Question

1 answers

solution1 0 2020-04-12 12:32:01

Problem statement

Calculating gradients

Toy Example

Comments

solution1
0 2020-04-12 12:32:01