[英]Single prediction with linear regression
Implementing linear regression as below: 实现线性回归如下:
from sklearn.linear_model import LinearRegression
x = [1,2,3,4,5,6,7]
y = [1,2,1,3,2.5,2,5]
# Create linear regression object
regr = LinearRegression()
# Train the model using the training sets
regr.fit([x], [y])
# print(x)
regr.predict([[1, 2000, 3, 4, 5, 26, 7]])
produces : 产生:
array([[1. , 2. , 1. , 3. , 2.5, 2. , 5. ]])
In utilizing the predict function why cannot utilize a single x value in order to make prediction? 在利用预测功能时,为什么不能利用单个x值来进行预测?
Trying regr.predict([[2000]])
尝试
regr.predict([[2000]])
returns: 返回:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-3a8b477f5103> in <module>()
11
12 # print(x)
---> 13 regr.predict([[2000]])
/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in predict(self, X)
254 Returns predicted values.
255 """
--> 256 return self._decision_function(X)
257
258 _preprocess_data = staticmethod(_preprocess_data)
/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in _decision_function(self, X)
239 X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
240 return safe_sparse_dot(X, self.coef_.T,
--> 241 dense_output=True) + self.intercept_
242
243 def predict(self, X):
/usr/local/lib/python3.6/dist-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
138 return ret
139 else:
--> 140 return np.dot(a, b)
141
142
ValueError: shapes (1,1) and (7,7) not aligned: 1 (dim 1) != 7 (dim 0)
When you do this: 执行此操作时:
regr.fit([x], [y])
You are essentially inputing this: 您实际上是在输入以下内容:
regr.fit([[1,2,3,4,5,6,7]], [[1,2,1,3,2.5,2,5]])
that has a shape of (1,7)
for X
and (1,7)
for y
. X
的形状为(1,7)
, y
的形状为(1,7)
。
Now looking at the documentation of fit()
: 现在查看
fit()
的文档 :
Parameters: 参数:
X : numpy array or sparse matrix of shape [n_samples,n_features] Training data y : numpy array of shape [n_samples, n_targets] Target values. Will be cast to X's dtype if necessary
So here, what the model assumes it that you have data which have data has 7 features and you have 7 targets. 因此,在这里,模型假设您拥有的数据具有7个功能部件和7个目标。 Please see this for more information on multi-output regression .
请参阅此以获取有关多输出回归的更多信息 。
So at the prediction time, model will require data with 7 features, something of shape (n_samples_to_predict, 7)
and will output the data with shape (n_samples_to_predict, 7)
. 因此,在预测时,模型将需要具有7个特征的数据(形状为
(n_samples_to_predict, 7)
,并输出具有形状(n_samples_to_predict, 7)
。
If instead, you wanted to have something like this: 相反,如果您想要这样的话:
x y
1 1.0
2 2.0
3 1.0
4 3.0
5 2.5
6 2.0
7 5.0
then you need to have a shape of (7,1)
for input x
and (7,)
or (7,1)
for target y
. 那么您需要为输入
x
设置形状(7,1)
,为目标y
设置形状为(7,1)
(7,)
或(7,1)
。
So as @WStokvis said in comments, you need to do this: 因此,正如@WStokvis在评论中所说,您需要这样做:
import numpy as np
X = np.array(x).reshape(-1, 1)
y = np.array(y) # You may omit this step if you want
regr.fit(X, y) # Dont wrap it in []
And then again at prediction time: 然后在预测时间再次:
X_new = np.array([1, 2000, 3, 4, 5, 26, 7]).reshape(-1, 1)
regr.predict(X_new)
And then doing the following will not raise error: 然后执行以下操作不会引发错误:
regr.predict([[2000]])
because the required shape is present. 因为存在所需的形状。
Update for the comment:- 更新评论:-
When you do [[2000]]
, it will be internally converted to np.array([[2000]])
, so it has the shape (1,1)
. 当您执行
[[2000]]
,它将在内部转换为np.array([[2000]])
,因此其形状为(1,1)
。 This is similar to (n_samples, n_features)
, where n_features = 1
. 这类似于
(n_samples, n_features)
,其中n_features = 1
。 This is correct for the model because at the training, the data has shape (n_samples, 1)
. 这对于模型是正确的,因为在训练时,数据具有形状
(n_samples, 1)
。 So this works. 所以这可行。
Now lets say, you have: 现在说,您有:
X_new = [1, 2000, 3, 4, 5, 26, 7] #(You havent wrapped it in numpy array and reshape(-1,1) yet
Again, it will be internally transformed as this: 同样,它将在内部进行如下转换:
X_new = np.array([1, 2000, 3, 4, 5, 26, 7])
So now X_new has a shape of (7,)
. 所以现在X_new的形状为
(7,)
。 See its only a one dimensional array. 只能看到其一维数组。 It doesn't matter if its a row vector or a column vector.
不管是行向量还是列向量都没有关系。 Its just one-dimensional array of
(n,)
. 它只是
(n,)
一维数组。
So scikit may not infer whether its n_samples=n
and n_features=1
or other way around ( n_samples=1
and n_features=n
). 因此,scikit可能无法推断其
n_samples=n
和n_features=1
还是其他方式( n_samples=1
和n_features=n
)。 Please see my other answer which explains about this . 请参阅我的其他答案以对此进行解释 。
So we need to explicitly convert the one-dimensional array to 2-d by reshape(-1,1)
. 因此,我们需要通过
reshape(-1,1)
将一维数组显式转换为2-d。 Hope its clear now. 希望现在清除。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.