简体   繁体   中英

SKLearn Predicting using new Data

I've tried out Linear Regression using SKLearn. I have data something along the lines of: Calories Eaten | Weight.

150 | 150

300 | 190

350 | 200

Basically made up numbers but I've fit the dataset into the linear regression model.

What I'm confused on is, how would I go about predicting with new data, say I got 10 new numbers of Calories Eaten, and I want it to predict Weight?

regressor = LinearRegression()
regressor.fit(x_train, y_train)
y_pred = regressor.predict(x_test) ??

But how would I go about making only my 10 new data numbers of Calories Eaten and make it the Test Set I want the regressor to predict?

You are correct, you simply call the predict method of your model and pass in the new unseen data for prediction. Now it also depends on what you mean by new data . Are you referencing data that you do not know the outcome of (ie you do not know the weight value), or is this data being used to test the performance of your model?

For new data (to predict on):

Your approach is correct. You can access all predictions by simply printing the y_pred variable.

You know the respective weight values and you want to evaluate model:

Make sure that you have two separate data sets: x_test (containing the features) and y_test (containing the labels). Generate the predictions as you are doing with the y_pred variable, then you can calculate its performance using a number of performance metrics. Most common one is the root mean square, and you simply pass the y_test and y_pred as parameters. Here is a list of all the regression performance metrics supplied by sklearn.

If you do not know the weight value of the 10 new data points:

Use train_test_split to split your initial data set into 2 parts: training and testing . You would have 4 datasets: x_train , y_train , x_test , y_test .

from sklearn.model_selection import train_test_split
# random state can be any number (to ensure same split), and test_size indicates a 25% cut
x_train, y_train, x_test, y_test = train_test_split(calories_eaten, weight, test_size = 0.25, random_state = 42) 

Train model by fitting x_train and y_train . Then evaluate model's training performance by predicting on x_test and comparing these predictions with the actual results from y_test . This way you would have an idea of how the model performs. Furthermore, you can then predict the weight values for the 10 new data points accordingly.

It is also worth reading further on the topic as a beginner. This is a simple tutorial to follow.

What I'm confused on is, how would I go about predicting with new data, say I got 10 new numbers of Calories Eaten, and I want it to predict Weight?

Yes, Calories Eaten represents the independent variable while Weight represent dependent variable.

After you split the data into training set and test set the next step is to fit the regressor using X_train and y_train data.

After the model is trained you can predict the results for X_test method and so we got the y_pred .

Now you can compare y_pred (predicted data) with y_test which is real data.

You can also use score method for your created linear model in order to get the performance of your model.

score is calculated using R^2 (R squared) metric or Coefficient of determination.

score = regressor.score(x_test, y_test)

For splitting the data you can use train_test_split method.

from sklearn.model_selection import train_test_split
X_train, y_train, X_test, y_test = train_test_split(eaten, weight, test_size = 0.2, random_state = 0)

You have to select the model using model_selection in sklearn then train and fit the dataset.

from sklearn.model_selection import train_test_split
X_train, y_train, X_test, y_test = train_test_split(eaten, weight)

regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM