简体   繁体   English

在Python中实现Gradient Descent并收到溢出错误

[英]Implementing Gradient Descent In Python and receiving an overflow error

Gradient Descent and Overflow Error 梯度下降和溢出错误

I am currently implementing vectorized gradient descent in python. 我目前正在python中实现矢量化梯度下降。 However, I continue to get an overflow error. 但是,我继续遇到溢出错误。 The numbers in my dataset are not extremely large though. 我的数据集中的数字并不是非常大。 I am using this formula: 我正在使用这个公式:

矢量化梯度下降的公式 I choose this implementation to avoid using derivatives. 我选择此实现以避免使用衍生物。 Does anyone have any suggestion on how to remedy this problem or am I implementing it wrong? 有没有人对如何解决这个问题有任何建议,或者我是否实施错误? Thank you in advance! 先感谢您!

Dataset Link: https://www.kaggle.com/CooperUnion/anime-recommendations-database/data 数据集链接: https//www.kaggle.com/CooperUnion/anime-recommendations-database/data

## Cleaning Data ##
import math
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data = pd.read_csv('anime.csv')
# print(data.corr())
# print(data['members'].isnull().values.any()) # Prints False
# print(data['rating'].isnull().values.any()) # Prints True

members = [] # Corresponding fan club size for row 
ratings = [] # Corresponding rating for row

for row in data.iterrows():
    if not math.isnan(row[1]['rating']): # Checks for Null ratings
        members.append(row[1]['members'])
        ratings.append(row[1]['rating'])


plt.plot(members, ratings)
plt.savefig('scatterplot.png')

theta0 = 0.3 # Random guess
theta1 = 0.3 # Random guess
error = 0

Formula's 公式的

def hypothesis(x, theta0, theta1):
    return  theta0 + theta1 * x

def costFunction(x, y, theta0, theta1, m):
    loss = 0 
    for i in range(m): # Represents summation
        loss += (hypothesis(x[i], theta0, theta1) - y[i])**2
    loss *= 1 / (2 * m) # Represents 1/2m
    return loss

def gradientDescent(x, y, theta0, theta1, alpha, m, iterations=1500):
    for i in range(iterations):
        gradient0 = 0
        gradient1 = 0
        for j in range(m):
            gradient0 += hypothesis(x[j], theta0, theta1) - y[j]
            gradient1 += (hypothesis(x[j], theta0, theta1) - y[j]) * x[j]
        gradient0 *= 1/m
        gradient1 *= 1/m
        temp0 = theta0 - alpha * gradient0
        temp1 = theta1 - alpha * gradient1
        theta0 = temp0
        theta1 = temp1
        error = costFunction(x, y, theta0, theta1, len(y))
        print("Error is:", error)
    return theta0, theta1

print(gradientDescent(members, ratings, theta0, theta1, 0.01, len(ratings)))

Error's 错误的

After several iterations, my costFunction being called within my gradientDescent function gives me an OverflowError: (34, 'Result too large'). 在几次迭代之后,我在我的gradientDescent函数中调用的costFunction给了我一个OverflowError:(34,'Result too large')。 However, I expect my code to continually print out a decreasing error value. 但是,我希望我的代码能够不断打印出一个递减的错误值。

    Error is: 1.7515692852199285e+23
    Error is: 2.012089675182454e+38
    Error is: 2.3113586742689143e+53
    Error is: 2.6551395730578252e+68
    Error is: 3.05005286756189e+83
    Error is: 3.503703756035943e+98
    Error is: 4.024828599077087e+113
    Error is: 4.623463163528686e+128
    Error is: 5.311135890211131e+143
    Error is: 6.101089907410428e+158
    Error is: 7.008538065634975e+173
    Error is: 8.050955905074458e+188
    Error is: 9.248418197694096e+203
    Error is: 1.0623985545062037e+219
    Error is: 1.220414847696018e+234
    Error is: 1.4019337603196565e+249
    Error is: 1.6104509643047377e+264
    Error is: 1.8499820618048921e+279
    Error is: 2.1251399172389593e+294
    Traceback (most recent call last):
      File "tyreeGradientDescent.py", line 54, in <module>
        print(gradientDescent(members, ratings, theta0, theta1, 0.01, len(ratings)))
      File "tyreeGradientDescent.py", line 50, in gradientDescent
        error = costFunction(x, y, theta0, theta1, len(y))
      File "tyreeGradientDescent.py", line 33, in costFunction
        loss += (hypothesis(x[i], theta0, theta1) - y[i])**2
    OverflowError: (34, 'Result too large')

Your data values are really very large, which makes your loss function very steep. 您的数据值非常大,这使您的损失功能非常陡峭。 The result is that you need a tiny alpha unless you normalize your data to smaller values. 结果是除非将数据标准化为更小的值,否则您需要一个小的 alpha。 With an alpha value that is too large your gradient descent is hopping all over the place and actually diverges, which is why your error rate is going up rather than down. 由于alpha值过大,您的渐变下降会在整个地方跳跃并实际发散,这就是您的错误率上升而不是下降的原因。

With your current data, an alpha of 0.0000000001 will make the error converge. 使用当前数据,alpha为0.0000000001将使误差收敛。 After 30 iterations my loss went from : 经过30次迭代,我的损失来自:

Error is: 66634985.91339202

to

Error is: 16.90452378179708

import numpy as np
import pandas as pd

X = [0.5, 2.5]
Y = [0.2, 0.9]

def f(w, b, x): #sigmoid with parameter w,b
    return 1.0/(1.0 * np.exp(-(w * x + b)))


def error(w, b):
    err = 0.0
    for x, y in zip(X, Y):
        fx = f(w, b, x)
        err += 0.5 * (fx - y)**2
    return err

def grad_b(w, b, x, y):
    fx = f(w, b, x)
    return (fx - y) * fx * (1 - fx)

def grad_w(w, b, x, y):
    fx = f(w, b, x)
    return (fx - y) * fx * (1 - fx) * x

def do_gradient_descent():
    w, b, eta, max_epochs = 1, 1, 0.01, 100
    for i in range(max_epochs):
        dw, db = 0, 0
        for x, y in zip(X, Y):
            dw += grad_w(w, b, x, y)
            db += grad_b(w, b, x, y)
        w = w - eta * dw
        print(w)
        b = b - eta * db
        print(b)
    er = error(w, b)
    #print(er)
    return er
##Calling Gradient Descent function
do_gradient_descent()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM