线性回归预测与训练数据不匹配

Question

I am a newbie to machine learning.我是机器学习的新手。 I am trying a simple prediction using linear regression with "made up" data that follows a specific pattern.我正在尝试使用线性回归和遵循特定模式的“合成”数据进行简单预测。 For some reason, the prediction is not matching the training data.由于某种原因，预测与训练数据不匹配。 Can you let me know what I need to fix?你能告诉我我需要修复什么吗？ The sample code is below示例代码如下

from sklearn import linear_model
import numpy as np

X = np.random.randint(3, size=(3, 1000))
Y = np.random.randint(10, size=(1, 1000))
# f1, f2, f3 - min = 0, max = 2
# f1 = 0 and f2 = 1  then 7 <= Y < 10, irrespective of f3
# f1 = 1 and f2 = 2 Y is 0, irrespective of f3
# f1 = 0 and f2 = 2 if f3 = 2 then 3 <= Y < 7 else Y = 0
for i in range(1000):
    if ((X[0][i] == 0 and X[1][i] == 1) or (X[0][i] == 1 and X[1][i] == 0)):
        Y[0][i] = np.random.randint(7, 10)
    elif ((X[0][i] == 1 and X[1][i] == 2) or (X[0][i] == 2 and X[1][i] == 1)):
        Y[0][i] = 0
    elif ((X[0][i] == 0 and X[1][i] == 2 and X[2][i] == 2) or
         (X[0][i] == 2 and X[1][i] == 0 and X[2][i] == 2)):
        Y[0][i] = np.random.randint(3, 7)
    else:
        Y[0][i] = 0

X1 = X.transpose()
Y1 = Y.reshape(-1, 1)
print zip(X1, Y1)

# create and fit the model
clf = linear_model.LinearRegression()
clf.fit(X1, Y1)

Z = np.array([[0, 0, 0, 0, 1, 1],
              [1, 1, 2, 2, 2, 2],
              [1, 2, 1, 2, 1, 2]])
Z1 = Z.transpose()
print Z1

y_predict = clf.predict(Z1)
print y_predict

Answer 1

And why would it match the training data?为什么它会匹配训练数据？ Your X->Y relation is clearly non-linear, and only for perfect linear relation, meaning that Y = AX + b, you can expect linear regression to fit training data perfectly.您的 X->Y 关系显然是非线性的，并且仅适用于完美的线性关系，这意味着 Y = AX + b，您可以期望线性回归完美地拟合训练数据。 Otherwise, you can get arbitrary far away from the solution - see for example an Anscombe's quartet (image belowo from wiki).否则，您可以随意远离解决方案 - 例如，请参见 Anscombe 的四重奏（来自 wiki 的下图）。

线性回归预测与训练数据不匹配

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-04-02 09:00:57

线性回归预测与训练数据不匹配

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-04-02 09:00:57

解决方案1
1 已采纳 2016-04-02 09:00:57