线性回归的单一预测

Question

Implementing linear regression as below: 实现线性回归如下：

from sklearn.linear_model import LinearRegression

x = [1,2,3,4,5,6,7]
y = [1,2,1,3,2.5,2,5]

# Create linear regression object
regr = LinearRegression()

# Train the model using the training sets
regr.fit([x], [y])

# print(x)
regr.predict([[1, 2000, 3, 4, 5, 26, 7]])

produces : 产生：

array([[1. , 2. , 1. , 3. , 2.5, 2. , 5. ]])

In utilizing the predict function why cannot utilize a single x value in order to make prediction? 在利用预测功能时，为什么不能利用单个x值来进行预测？

Trying regr.predict([[2000]]) 尝试regr.predict([[2000]])

returns: 返回：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-3a8b477f5103> in <module>()
     11 
     12 # print(x)
---> 13 regr.predict([[2000]])

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in predict(self, X)
    254             Returns predicted values.
    255         """
--> 256         return self._decision_function(X)
    257 
    258     _preprocess_data = staticmethod(_preprocess_data)

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in _decision_function(self, X)
    239         X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
    240         return safe_sparse_dot(X, self.coef_.T,
--> 241                                dense_output=True) + self.intercept_
    242 
    243     def predict(self, X):

/usr/local/lib/python3.6/dist-packages/sklearn/utils/extmath.py in safe_sparse_dot(a, b, dense_output)
    138         return ret
    139     else:
--> 140         return np.dot(a, b)
    141 
    142 

ValueError: shapes (1,1) and (7,7) not aligned: 1 (dim 1) != 7 (dim 0)

Answer 1

When you do this: 执行此操作时：

regr.fit([x], [y])

You are essentially inputing this: 您实际上是在输入以下内容：

regr.fit([[1,2,3,4,5,6,7]], [[1,2,1,3,2.5,2,5]])

that has a shape of (1,7) for X and (1,7) for y . X的形状为(1,7) ， y的形状为(1,7) 。

Now looking at the documentation of fit() : 现在查看fit()的文档：

Parameters: 参数：

 X : numpy array or sparse matrix of shape [n_samples,n_features] Training data y : numpy array of shape [n_samples, n_targets] Target values. Will be cast to X's dtype if necessary

So here, what the model assumes it that you have data which have data has 7 features and you have 7 targets. 因此，在这里，模型假设您拥有的数据具有7个功能部件和7个目标。 Please see this for more information on multi-output regression . 请参阅此以获取有关多输出回归的更多信息。

So at the prediction time, model will require data with 7 features, something of shape (n_samples_to_predict, 7) and will output the data with shape (n_samples_to_predict, 7) . 因此，在预测时，模型将需要具有7个特征的数据（形状为(n_samples_to_predict, 7) ，并输出具有形状(n_samples_to_predict, 7) 。

If instead, you wanted to have something like this: 相反，如果您想要这样的话：

then you need to have a shape of (7,1) for input x and (7,) or (7,1) for target y . 那么您需要为输入x设置形状(7,1) ，为目标y设置形状为(7,1) (7,)或(7,1) 。

So as @WStokvis said in comments, you need to do this: 因此，正如@WStokvis在评论中所说，您需要这样做：

import numpy as np
X = np.array(x).reshape(-1, 1)
y = np.array(y)          # You may omit this step if you want

regr.fit(X, y)           # Dont wrap it in []

And then again at prediction time: 然后在预测时间再次：

X_new = np.array([1, 2000, 3, 4, 5, 26, 7]).reshape(-1, 1)
regr.predict(X_new)

And then doing the following will not raise error: 然后执行以下操作不会引发错误：

regr.predict([[2000]])

because the required shape is present. 因为存在所需的形状。

Update for the comment:- 更新评论：-

When you do [[2000]] , it will be internally converted to np.array([[2000]]) , so it has the shape (1,1) . 当您执行[[2000]] ，它将在内部转换为np.array([[2000]]) ，因此其形状为(1,1) 。 This is similar to (n_samples, n_features) , where n_features = 1 . 这类似于(n_samples, n_features) ，其中n_features = 1 。 This is correct for the model because at the training, the data has shape (n_samples, 1) . 这对于模型是正确的，因为在训练时，数据具有形状(n_samples, 1) 。 So this works. 所以这可行。

Now lets say, you have: 现在说，您有：

X_new = [1, 2000, 3, 4, 5, 26, 7] #(You havent wrapped it in numpy array and reshape(-1,1) yet

Again, it will be internally transformed as this: 同样，它将在内部进行如下转换：

X_new = np.array([1, 2000, 3, 4, 5, 26, 7])

So now X_new has a shape of (7,) . 所以现在X_new的形状为(7,) 。 See its only a one dimensional array. 只能看到其一维数组。 It doesn't matter if its a row vector or a column vector. 不管是行向量还是列向量都没有关系。 Its just one-dimensional array of (n,) . 它只是(n,)一维数组。

So scikit may not infer whether its n_samples=n and n_features=1 or other way around ( n_samples=1 and n_features=n ). 因此，scikit可能无法推断其n_samples=n和n_features=1还是其他方式（ n_samples=1和n_features=n ）。 Please see my other answer which explains about this . 请参阅我的其他答案以对此进行解释。

So we need to explicitly convert the one-dimensional array to 2-d by reshape(-1,1) . 因此，我们需要通过reshape(-1,1)将一维数组显式转换为2-d。 Hope its clear now. 希望现在清除。

线性回归的单一预测

问题描述

1 个解决方案

解决方案1
5 已采纳 2018-04-30 05:37:23

线性回归的单一预测

问题描述

1 个解决方案

解决方案1 5 已采纳 2018-04-30 05:37:23

解决方案1
5 已采纳 2018-04-30 05:37:23