简体   繁体   English

如何处理多元线性回归中的误差维度?

[英]How to deal with error dimension in multiple linear regression?

I am trying to make multiple linear regression with sklearn.我正在尝试使用 sklearn 进行多元线性回归。

features_2 = ['chronic_disease_binary', 'outcome']

X = df.loc[:, features_2].values
Y = df.loc[:, ['age']].values
# X = pd.get_dummies(X,drop_first=True)
#
X_train_lm, X_test_lm, y_train_lm, y_test_lm = create_dataset_test(X, Y)
X_train_lm = X_train_lm.reshape((2596, -1))
lm = linear_model.LinearRegression()
model = lm.fit(X_train_lm, y_train_lm)
y_pred_lm = lm.predict(X_test_lm)

I have this issue when I am trying tp make prediction on X_test:当我尝试对 X_test 进行预测时,我遇到了这个问题:

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1) ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1)

  • My X_train has this form:我的 X_train 有这种形式:
[[-0.77046461  1.29791815]
 [-0.77046461 -0.77046461]
 [-0.77046461  1.29791815]
 ...
 [-0.77046461 -0.77046461]
 [-0.77046461  1.29791815]
 [-0.77046461 -0.77046461]]
  • And my y_train is like this:我的 y_train 是这样的:
[[59.]
 [54.]
 [40.]
 ...
 [24.]
 [33.]
 [41.]]

  • The data where I make my prediction has this form:我做出预测的数据具有以下形式:
[[-0.76666002]
 [ 1.30435914]
 [-0.76666002]
 ...
 [-0.76666002]
 [-0.76666002]
 [-0.76666002]]

Dimension mismatch.尺寸不匹配。

You have incompatible dimensions, since X_test_lm has N (number of rows) samples but only 1 (number of columns) feature/variable compared to the shape of X_train .您的尺寸不兼容,因为与 X_train 的形状相比, X_test_lm有 N(行数)样本X_train但只有 1(列数)特征/变量。


Details:细节:

You have a X_train as:你有一个X_train为:

[[-0.77046461  1.29791815]
 [-0.77046461 -0.77046461]
 [-0.77046461  1.29791815]
 ...
 [-0.77046461 -0.77046461]
 [-0.77046461  1.29791815]
 [-0.77046461 -0.77046461]]

so the model is trained on N (number of rows) samples with 2 (number of columns) features/variables.因此 model 在具有 2 个(列数)特征/变量的 N(行数)样本上进行训练。

Then, when you ask to predict the:然后,当您要求预测:

[[-0.76666002]
 [ 1.30435914]
 [-0.76666002]
 ...
 [-0.76666002]
 [-0.76666002]
 [-0.76666002]]

you have incompatible dimesnions, since X_test_lm has again N (number of rows) samples but this time only 1 (number of columns) feature/variable.您有不兼容的尺寸,因为X_test_lm再次有 N(行数)样本,但这次只有 1(列数)特征/变量。

But, the predict function of the model expects an input an array with shape [N,2] and you get:但是,model 的predict function 期望输入一个形状为 [N,2] 的数组,您会得到:

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1) ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 1)

As you said, X_test_lm.shape is (1300, 1) so the model is trying to predict the values of these 1300 samples having only one feature (1).正如你所说, X_test_lm.shape是 (1300, 1) 所以 model 试图预测这 1300 个样本的值,这些样本只有一个特征 (1)。 That's what triggers the error.这就是触发错误的原因。 The model was training using the X_train that had shape [N,2] not [N,1]. X_train使用形状为 [N,2]而不是[N,1] 的 X_train 进行训练。


As the value of X_test_lm.shape is (1300, 1) , it means that it has only 1 column, not 2 as the train data.由于X_test_lm.shape的值为(1300, 1) ,这意味着它只有 1 列,而不是 2 作为训练数据。 The beta vector trained on the trained data expects a matrix with 2 columns, which gives the error.在训练数据上训练的 beta 向量需要一个有 2 列的矩阵,这会给出错误。

You should check the definition of create_dataset_test to see how you got to this state.你应该检查create_dataset_test的定义,看看你是如何得到这个 state 的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM