简体   繁体   中英

Why are my array values not getting updated? Linear regression

I need to make a linear regression model in python without using scikit. You can ignore the part involving input as that part is according to the file given to me. I have added my entire code in case I've done something wrong.

import pandas as pd
import numpy as np
import matplotlib.pyplot as mlt
from sklearn.cross_validation import train_test_split 
data = pd.read_csv("housing.csv", delimiter = ' ', skipinitialspace = True, names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV'])
df_x = data.drop('MEDV', axis = 1)
df_y = data['MEDV']
x_train, x_test, y_train, y_test = train_test_split(df_x.values, df_y.values, test_size = 0.2, random_state = 4)
theta = np.zeros((1, 13))

In the above code, I've just taken input and made a parameters array called theta.

def costfn(x, y, theta):
    j = np.sum(x.dot(theta.T) - y) ** 2 / (2 * len(y))
    return j


def gradient(x, y, theta, alpha, iterations):
    cost_history = [0] * iterations

    for i in range(iterations):
        h = theta.dot(x.T) #hypothesis
        loss = h - y
        #print(loss)
        g = loss.dot(x) / len(y)
        #print(g)
        theta = theta - alpha * g
        cost_history[i] = costfn(x, y, theta)
    #print(theta)
    return theta, cost_history

theta, cost_history = gradient(x_train, y_train, theta, 0.001, 1000)
#print(theta) 

All the lines I have commented give output as nan of appropriate size.

I have used a logic similar to the one used on this blog Do tell me if I'm wrong.

I think in general your code is working. Most likely what you observe has to do with the setting of your alpha. It seems to be too high, so theta diverges. At some point it gets inf or -inf and after that, you get NaN s in the next iteration. I recognized the same problem.

You can verify that using a simple set up:

# output theta in your function
def gradient(x, y, theta, alpha, iterations):
    cost_history = [0] * iterations

    for i in range(iterations):
        h = theta.dot(x.T) #hypothesis
        #print('h:', h)
        loss = h - y
        #print('loss:', loss)
        g = loss.dot(x) / len(y)
        #print('g:', g)
        theta = theta - alpha * g
        print('theta:', theta)
        cost_history[i] = costfn(x, y, theta)
    #print(theta)
    return theta, cost_history

# set up example data with a simple linear relationship
# where we can play around with different numbers of parameters
# conveniently
# with some noise
num_params= 2   # how many params do you want to estimate (up to 5)
# take some fixed params (we only take num_params of them)
real_params= [2.3, -0.1, 8.5, -1.8, 3.2]

# now generate the data for the number of parameters chosen
x_train= np.random.randint(-100, 100, size=(80, num_params))
x_noise= np.random.randint(-100, 100, size=(80, num_params)) * 0.001
y_train= (x_train + x_noise).dot(np.array(real_params[:num_params]))
theta= np.zeros(num_params)

Now try with a high learning rate

theta, cost_history = gradient(x_train, y_train, theta, 0.1, 1000)

You most likely will observe, that the exponents of your theta values get higher and higher until they finally reach inf or -inf . After that you get your NaN values.

If you set it to a low value like 0.00001 however, you see that it converges:

theta: [ 0.07734451 -0.00357339]
theta: [ 0.15208803 -0.007018  ]
theta: [ 0.22431803 -0.01033852]
theta: [ 0.29411905 -0.01353942]
theta: [ 0.36157275 -0.01662507]
theta: [ 0.42675808 -0.01959962]
theta: [ 0.48975132 -0.02246712]
theta: [ 0.55062617 -0.02523144]
...
theta: [ 2.29993382 -0.09981407]
theta: [ 2.29993382 -0.09981407]
theta: [ 2.29993382 -0.09981407]
theta: [ 2.29993382 -0.09981407]

Which is very close to the real parameters 2.3 and -0.1 .

So you could experiment with code, that adapts the learning rate, so the values converge faster and the risk of divergence is lower. You also could implement something like early stopping, so it stops iterating over the samples, if the error doesn't change or the change is below a threshold.

Eg you can use the following modification to your function:

def gradient(
        x, 
        y, 
        theta=None, 
        alpha=0.1, 
        alpha_factor=0.1 ** (1/5), 
        change_threshold=1e-10, 
        max_iterations=500, 
        verbose=False):
    cost_history = list()
    if theta is None:
        # theta was not passed explicitely
        # so initialize it
        theta= np.zeros(x.shape[1])
    last_loss_sum= float('inf')
    len_y= len(y)
    for i in range(1, max_iterations+1):
        h = theta.dot(x.T) #hypothesis
        loss = h - y
        loss_sum= np.sum(np.abs(loss))
        if last_loss_sum <= loss_sum:
            # the loss didn't decrease
            # so decrease alpha
            alpha= alpha * alpha_factor
        if verbose:
            print(f'pass: {i:4d} loss: {loss_sum:.8f} / alpha: {alpha}')
        theta_old= theta
        g= loss.dot(x) / len_y
        if loss_sum <= last_loss_sum and last_loss_sum < float('inf'):
            # only apply the change if the loss is
            # finite to avoid infinite entries in theta
            theta = theta - alpha * g
            theta_change= np.sum(np.abs(theta_old - theta))
            if theta_change < change_threshold:
                # Maybe this seems a bit awkward, but
                # the comparison of change_threshold
                # takes the relationship between theta and g
                # into account. Note that g will not have
                # an effect if theta is orders of magnitude
                # larger than g, even if g itself is large.
                # (I mean if you consider g and theta elementwise)
                cost_history.append(costfn(x, y, theta))
                break
        cost_history.append(costfn(x, y, theta))
        last_loss_sum= loss_sum
    return theta, cost_history

The changes address early stopping, automatic adjustment of alpha and avoding theta to take on infinite values. You only need to pass X and y in the minimal case, all other parameters get defaults. Set verbose=True if you want to see, how the loss decreases in each iteraton.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM