简体   繁体   中英

Implementing gradient descent in python

I was trying to build a gradient descent function in python. I have used the binary-crossentropy as the loss function and sigmoid as the activation function.

def sigmoid(x):
    return 1/(1+np.exp(-x))

def binary_crossentropy(y_pred,y):
    epsilon = 1e-15
    y_pred_new = np.array([max(i,epsilon) for i in y_pred])
    y_pred_new = np.array([min(i,1-epsilon) for i in y_pred_new])
    return -np.mean(y*np.log(y_pred_new) + (1-y)*np.log(1-y_pred_new))

def gradient_descent(X, y, epochs=10, learning_rate=0.5):
    features = X.shape[0]
    w = np.ones(shape=(features, 1))
    bias = 0
    n = X.shape[1]
    for i in range(epochs):
        weighted_sum = w.T@X + bias
        y_pred = sigmoid(weighted_sum)
        
        loss = binary_crossentropy(y_pred, y)
        
        d_w = (1/n)*(X@(y_pred-y).T)
        d_bias = np.mean(y_pred-y)
        
        w = w - learning_rate*d_w
        bias = bias - learning_rate*d_bias
        
        print(f'Epoch:{i}, weights:{w}, bias:{bias}, loss:{loss}')
    return w, bias

So, as input I gave

X = np.array([[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.4, 0.6, 0.2, 0.4], 
              [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.9, 0.4, 0.7]])
y = 2*X[0] - 3*X[1] + 0.4

and then w, bias = gradient_descent(X, y, epochs=100) the output was w = array([[-20.95],[-29.95]]) , b = -55.50000017801383 , and loss:40.406546076763014 . The weights are decreasing(becoming more -ve) and bias is also decreasing for more epochs. Expected output was w = [[2],[-3]], and b = 0.4.

I don't know what I am doing wrong, the loss is also not converging. It is constant throughout all the epochs.

Usually, binary cross-entropy loss is used for binary classification task. However, here your task is a linear regression so I would prefer using Mean Square Error as loss function. Here is my suggesstion:

def gradient_descent(X, y, epochs=1000, learning_rate=0.5):
    w = np.ones((X.shape[0], 1))
    bias = 1
    n = X.shape[1]

    for i in range(epochs):
        y_pred = w.T @ X + bias

        mean_square_err = (1.0 / n) * np.sum(np.power((y - y_pred), 2))

        d_w = (-2.0 / n) * (y - y_pred) @ X.T
        d_bias = (-2.0 / n) * np.sum(y - y_pred)

        w -= learning_rate * d_w.T
        bias -= learning_rate * d_bias

        print(f'Epoch:{i}, weights:{w}, bias:{bias}, loss:{mean_square_err}')

    return w, bias


X = np.array([[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.4, 0.6, 0.2, 0.4],
              [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.9, 0.4, 0.7]])
y = 2*X[0] - 3*X[1] + 0.4

w, bias = gradient_descent(X, y, epochs=5000, learning_rate=0.5)

print(f'w = {w}')
print(f'bias = {bias}')

Output:

w = [[ 1.99999999], [-2.99999999]]
bias = 0.40000000041096756

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM