简体   繁体   English

如何在线性回归模型中使用 .predict()?

[英]How to use .predict() in a Linear Regression model?

I'm trying to predict what a 15-minute delay in flight departure does to the flight's arrival time.我试图预测航班起飞延迟 15 分钟对航班到达时间的影响。 I have thousands of rows as well as several columns in a DF.我在 DF 中有数千行和几列。 Two of these columns are dep_delay and arr_delay for departure delay and arrival delay.其中两个列是 dep_delay 和 arr_delay,分别表示出发延迟和到达延迟。 I have built a simple LinearRegression model:我建立了一个简单的线性回归模型:

y = nyc['dep_delay'].values.reshape((-1, 1))

arr_dep_model = LinearRegression().fit(y, nyc['arr_delay'])

And now I'm trying to find out the predicted arrival delay if the flights departure was delayed 15 minutes.现在我正试图找出如果航班起飞延迟 15 分钟,预计的到达延迟。 How would I use the model above to predict what the arrival delay would be?我将如何使用上面的模型来预测到达延迟是多少?

My first thought was to use a for loop / if statement, but then I came across .predict() and now I'm even more confused.我的第一个想法是使用 for 循环 / if 语句,但后来我遇到了.predict() ,现在我更加困惑了。 Does .predict work like a boolean, where I would use "if departure delay is equal to 15, then arrival delay equals y"? .predict 是否像布尔值一样工作,我会使用“如果出发延迟等于 15,那么到达延迟等于 y”? Or is it something like:或者是这样的:

arr_dep_model.predict(y)?

When working with LinearRegression models in sklearn you need to perform inference with the predict() function.LinearRegression中使用 LinearRegression 模型时,您需要使用predict()函数进行推理。 But you also have to ensure the input you pass to the function has the correct shape (the same as the training data) .但是您还必须确保传递给函数的输入具有正确的形状(与训练数据相同) You can learn more about the proper use of predict function in the official documentation.您可以在官方文档中了解更多关于正确使用预测功能的信息。

arr_dep_model.predict(youtInput)

This line of code would output a value that the model predicted for a corresponding input.这行代码将输出模型为相应输入预测的值。 You can insert this into a for loop and traverse a set of values to serve as the model's input, it depends on the needs for your project and the data you are working with.您可以将其插入 for 循环并遍历一组值作为模型的输入,这取决于您的项目的需求和您正在使用的数据。

Hi Check below code for an example:`嗨检查下面的代码示例:`

 import pandas as pd import random from sklearn.linear_model import LinearRegression df = pd.DataFrame({'x1':random.choices(range(0, 100), k=10), 'x2':random.choices(range(0, 100), k=10)}) df['y'] = df['x2'] * .5 X_train = df[['x1','x2']][:-3].values #Training on top 7 rows y_train = df['y'][:-3].values #Training on top 7 rows X_test = df[['x1','x2']][-3:].values # Values on which prediction will happen - bottom 3 rows regr = LinearRegression() regr.fit(X_train, y_train) regr.predict(X_test)

If you will notice X_test the data on which prediction is happening is of same shape as (number of columns) as X_train both have two columns ['X1','X2'].如果您注意到X_test发生预测的数据与(列数)形状相同,因为X_train都有两列 ['X1','X2']。 Same has been converted in array when .values is used.使用.values时,同样已在数组中转换。 You can create your own data (2 column dataframe for current example) & can use that for prediction (because 3rd column is need to be predicted).您可以创建自己的数据(当前示例为 2 列数据框)并可以将其用于预测(因为需要预测第 3 列)。

Output will be three values as predicted on three rows:输出将是三行预测的三个值:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM