[英]How to use a linear regression model to produce a single prediction value?
I have created three machine learning models using Scikit-learn in Jupyter Notebook (Linear regression, Dtree and Random forest).我在 Jupyter Notebook(线性回归、Dtree 和随机森林)中使用 Scikit-learn 创建了三个机器学习模型。 The purpose of the models are to predict the size of a cyclone (prediction/output ROCI) based on several cyclone parameters (predictors/inputs).
模型的目的是根据几个旋风参数(预测器/输入)来预测旋风的大小(预测/输出 ROCI)。 There are 9004 rows.
有 9004 行。 Below is an example of the linear regression model.
下面是一个线性回归模型的例子。
In[31]: df.head()
Out[31]: NAME LAT LON Pc Penv ROCI Vmax Pdc
0 HECTOR -15 128 985 1000 541 18 -15
1 HECTOR -15 127 990 1000 541 15.4 -10
2 HECTOR -16 126 992 1000 530 15 -8
3 HECTOR -16.3 126 992 1000 480 15.4 -8
4 HECTOR -16.5 126 992 1000 541 15.4 -8
In [32]: X=df[['LAT','LON','Pc','Vmax','Pdc=Pc-Penv']]
y=df['ROCI']
In [33]: X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.4)
In [34]: lm=LinearRegression()
In [35]: lm.fit(X_train,y_train)
Out [35]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
In [36]: print(lm.intercept_)
lm.coef_
-3464.3452921023572
Out [36]: array([-2.94229126, 0.29875575, 3.65214265, -1.25577799,
-6.43917746])
In [37]: predictions=lm.predict(X_test)
predictions
Out [37]:array([401.02108725, 420.01451472, 434.4241271 , ...,
287.67803538, 343.80516896, 340.1007666 ])
In [38]: plt.scatter(y_test,predictions)
plt.xlabel('Recorded')
plt.ylabel('Predicted')
*figure to display accuracy*
Now when I try to input a single value in the lm.predict() I get the following error:现在,当我尝试在 lm.predict() 中输入单个值时,出现以下错误:
ValueError: Expected 2D array, got scalar array instead:
array=300.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I assume this is due to the fact my model is trained using 5 columns, so trying to input the first row of my dataset:我认为这是因为我的模型是使用 5 列训练的,因此尝试输入数据集的第一行:
In [39]: lm.predict(-15,128,985,18,-15)
...
...
TypeError: predict() takes 2 positional arguments but 6 were
given
Trying the array.reshape as suggested I get:按照建议尝试 array.reshape 我得到:
In [49]: lm.predict(X_test.reshape(-1, 1))
...
...
AttributeError: 'DataFrame' object has no attribute 'reshape'
And now I am confused!现在我很困惑! Please could you assist me in using my model to give me a prediction value.
请您帮我使用我的模型给我一个预测值。 What should I input in lm.predict()?
我应该在 lm.predict() 中输入什么? I basically just want to be able to say "Pc=990, Vmax=18, Pdc=-12" and I get something like "ROCI=540".
我基本上只想说“Pc=990,Vmax=18,Pdc=-12”,然后我得到类似“ROCI=540”的信息。 Thank you for your time.
感谢您的时间。
If you want to predict the first row of your data, you should make it first as an array :如果要预测数据的第一行,则应首先将其作为数组:
import numpy as np
first_row = np.array([-15, 128, 985, 18, -15])
Then, when那么,当
lm.predict(first_row)
produces an error similar to the one you report,产生类似于您报告的错误,
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
follow the advice in the message, ie:按照消息中的建议进行操作,即:
lm.predict(first_row.reshape(1, -1))
For that you'd have to write为此你必须写
X=df[['Pc','Vmax','Pdc=Pc-Penv']]
instead of代替
X=df[['LAT','LON','Pc','Vmax','Pdc=Pc-Penv']]
Remember, the inputs you give to your model to train on, are the ones you've to give to in when you have to predict请记住,您提供给模型进行训练的输入是您必须预测时必须提供的输入
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.