简体   繁体   中英

did not find slope and intercept in multiple linear regression, nan value coming

I implement multiple linear regression from scratch but I did not find slope and intercept, gradient decent give me nan value.

Here is my code and I also give ipython notebook file.

https://drive.google.com/file/d/1NMUNL28czJsmoxfgeCMu3KLQUiBGiX1F/view?usp=sharing

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

x = np.array([[   1, 2104,    3],
           [   1, 1600,    3],
           [   1, 2400,    3],
           [   1, 1416,    2],
           [   1, 3000,    4],
           [   1, 1985,    4]])

y = np.array([399900, 329900, 369000, 232000, 539900, 299900])

def gradient_runner(x, y, altha, b, theta1, theta2):
    initial_m1 = 0
    initial_m2 = 0
    initial_b = 0
    N = len(x)

    for i in range(0, len(y)):
        x0 = x[i, 0]
        x1 = x[i, 1]
        x2 = x[i, 2]
        yi = y[i]

        h_theta = (theta1 * x1 + theta2 * x2 + b)

        initial_b += -(1/N) * x0 * (yi - h_theta) 

        initial_m1 += -(1/N) * x1 * (yi - h_theta) 

        initial_m2 += -(1/N) * x2 * (yi - h_theta)

    new_b = b - (altha * initial_b)
    new_m1 = theta1 - (altha * initial_m1)
    new_m2 = theta2 - (altha * initial_m2)
    return new_b, new_m1, new_m2

def fit(x, y, alpha, iteration, b, m1, m2):

    for i in range(0, iteration):

        b, m1, m2 = gradient_runner(x, y, alpha, b, m1, m2)
    return b, m1, m2

fit(x,y, 0.001, 1500, 0,0,0) 

This is not a programming issue, but an issue of your function. Numpy can use different data types . In your case it uses float64. You can check the largest number, you can represent with this data format:

>>>sys.float_info
>>>sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308,
min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, 
mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

Unfortunately, your iteration is not convergent for b, m1, m2 , at least not with the provided data set. In iteration 83 the values become too large to be represented as a float, which are displayed as inf and -inf for infinity. When this is fed into the next iterative step, Python returns NaN for "not a number".
Though there are ways in Python to overcome limitations of float number representation in terms of precision, this is not a strategy you have to explore. The problem here is that your fit function is not convergent. Whether this is due to the function itself, its implementation by you or the chosen initial guesses, I can't decide. A common reason for non-convergent fit behaviour is also, that the data set doesn't represent the fit function.

try scaling your x

def scale(x):
    for j in range(x.shape[1]):
        mean_x = 0
        for i in range(len(x)):
            mean_x += x[i,j]
        mean_x = mean_x / len(x)
        sum_of_sq = 0
        for i in range(len(x)):
            sum_of_sq += (x[i,j] - mean_x)**2
        stdev = sum_of_sq / (x.shape[0] -1)
        for i in range(len(x)):
            x[i,j] = (x[i,j] - mean_x) / stdev
    return x        

or you can use a pre defined standard scaler

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM