I am new to sklearn and I have an appropriately simple task: given a scatter plot of 15 dots, I need to
But I got stuck at the second step.
This is the data plot:
%matplotlib notebook
import numpy as np from sklearn.model_selection
import train_test_split from sklearn.linear_model
import LinearRegression from sklearn.preprocessing import PolynomialFeatures
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
plt.figure() plt.scatter(X_train, y_train, label='training data')
plt.scatter(X_test, y_test, label='test data')
plt.legend(loc=4);
I then take the 11 points in X_train
and transform them with a poly features of degree 3 as follow:
degrees = 3
poly = PolynomialFeatures(degree=degree)
X_train_poly = poly.fit_transform(X_train)
Then I try to fit a line through the transformed points (note: X_train_poly.size
= 364).
linreg = LinearRegression().fit(X_train_poly, y_train)
and I get the following error:
ValueError: Found input variables with inconsistent numbers of samples: [1, 11]
I have read various questions that address similar and often more complex problems (eg Multivariate (polynomial) best fit curve in python? ), but I could not extract a solution from them.
The issue is the dimension in the X_train and y_train. It is a single-dimension array so it is treating each of the X records as a separate variable.
Using the .reshape command as follows should do the trick:
# reshape data to have 11 records rather than 11 columns
X_trainT = X_train.reshape(11,1)
y_trainT = y_train.reshape(11,1)
# create polynomial features on the single va
poly = PolynomialFeatures(degree=3)
X_train_poly = poly.fit_transform(X_trainT)
print (X_train_poly.shape)
#
linreg = LinearRegression().fit(X_train_poly, y_trainT)
The error basically mean your X_train_poly
and y_train
doesn't match, where your X_train_poly
has only 1 set of x and your y_train
has 11 values. I'm not quite sure what you want, but I guess the polynomial features were not generated in the way you want. What your code currently doing is to generated the degree-3 polynomial features for a single 11-dimension point.
I think you want to generated the degree-3 polynomial features for every points (actually every x) of your 11 points. You can use a loop or list comprehension to do that:
X_train_poly = poly.fit_transform([[i] for i in X_train])
X_train_poly.shape
# (11, 4)
Now you can see your X_train_poly
has 11 points where each point is 4-dimension, rather than a single 364-dimension point. This new X_train_poly
matches the shape of y_train
and the regression may give you what you want:
linreg = LinearRegression().fit(X_train_poly, y_train)
linreg.coef_
# array([ 0. , -0.79802899, 0.2120088 , -0.01285893])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.