简体   繁体   中英

Use Sklearn to and Polynomial Regression to fit/predict equation of a curve. Infinite loop error

I'm provided a dataset and I'm trying to find a relation between some X and Y data. I want to be able to use sklearn library to plot the data and predict/plt the curve of the eqn.

However, my code is stuck in an infinite loop when I try and plot my predicted values after fitting the polynomial regression model to my dataset.

The end goal would be once I have the curve predicted/plotted I would like to be able to know what the full equation of the curve.

Here's my code.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
Dataset = pd.DataFrame()
Dataset["X"] = [6377, 6378, 6379, 6380, 6381, 6382, 6383, 6385, 6387, 6392, 6397, 6402]
Dataset["Y"] = [1.225, 1.112, 1.007, 0.9093, 0.8194, 0.7364, 0.6601, 0.5258, 0.4135, 0.1948, 0.08891, 0.04008]

print(Dataset)

X = np.reshape(np.array(Dataset['X']), (1, -1))
Y = np.reshape(np.array(Dataset['Y']), (1, -1))

print(X)
print(Y)

from sklearn.linear_model import LinearRegression

linReg = LinearRegression()
linReg.fit(X, Y)

plt.scatter(X, Y, color='red')
# plt.plot(X,linReg.predict(X), color = 'blue')

from sklearn.preprocessing import PolynomialFeatures

polyREG = PolynomialFeatures(degree=4)

xPoly = polyREG.fit_transform(X)

LinReg2 = LinearRegression()
LinReg2.fit(xPoly, Y)
#
# try:
#     xgrid = np.arange(min(X), max(X), .1)
# except Exception as e:
#     print(e)

# xgrid = range(6377, 6403, 1)
# xgrid = np.asarray(xgrid)
# print(xgrid.shape)
# xgrid = np.reshape(xgrid, (1,-1))

xgrid = np.reshape(np.arange(6300, 6405, 1), (1,-1))
print(xgrid.shape)
#X = np.reshape(np.array(Dataset['X']), (1, -1))
#plt.plot(xg, 1, color = "blue")
try:
    plt.plot(xgrid, LinReg2.predict(polyREG.fit_transform(xgrid)), color='blue')
except Exception as e:
    print(e)
plt.show()

It's not an infinite loop, it's just taking a while. When I ran polyREG.fit_transform(xgrid), it took about a minute. But then when I ran LinReg2.predict(polyREG.fit_transform(xgrid)), I got: "shapes (1,5563251) and (1820,12) not aligned: 5563251 (dim 1) != 1820 (dim 0)".

Edit after looking through the code some more:

Presumably, you're trying to train on 12 observations of one basic variable, and you want to fit a fourth order polynomial to that variable, giving you 12 observations of 5 derived variables (x^0,x^1,x^2,x^3,x^4), for a total of 60 x-values (12 rows of 5 values). You then want to predict on 106 new base values of x, giving you 530 total x-values (106 rows of 5 values). However, PolynomialFeatures thinks you have 1 observation of 106 variables, rather than 106 observations of 1 variable. Because of the cross terms, the number of derived variables is polynomial in the number of base variables. Instead of there being 106 rows of 5 values, there is 1 row of 5563251 values, which not only means that evaluating this is going to take a while, but also that the linear fit is going to fail, because the number of values in the training set rows is not the same as the number of values in the prediction set rows.

tl;dr change (1, -1) to (-1, 1) in your reshape commands.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM