R中的多元多项式回归（预测）

Question

我正在使用60/40测试拆分构建预测模型。 我想建立一个包含10个解释变量的多项式回归模型。

首先，我基于训练建立模型，然后根据training$y回归。

model_poly = lm(training$y ~ poly(training$x1, degree=2, raw=TRUE) +
     poly(training$x2, degree=2, raw=TRUE) +
     poly(training$x3, degree=2, raw=TRUE) +
     poly(training$x4, degree=2, raw=TRUE) +
     poly(training$x5, degree=2, raw=TRUE) +
     poly(training$x6, degree=2, raw=TRUE) +
     poly(training$x7, degree=2, raw=TRUE) +
     poly(training$x8, degree=2, raw=TRUE) +
     poly(training$x9, degree=2, raw=TRUE) +
     poly(training$x10, degree=2, raw=TRUE))

之后，我想使用该模型预测新数据（ test ）。

poly_predictions = predict(model_poly, poly(test$x1, degree=2, raw=TRUE)+
     poly(test$x2, degree=2, raw=TRUE) +
     poly(test$x3, degree=2, raw=TRUE) +
     poly(test$x4, degree=2, raw=TRUE) +
     poly(test$x5, degree=2, raw=TRUE) +
     poly(test$x6, degree=2, raw=TRUE) +
     poly(test$x7, degree=2, raw=TRUE) +
     poly(test$x8, degree=2, raw=TRUE) +
     poly(test$x9, degree=2, raw=TRUE) +
     poly(test$x10, degree=2, raw=TRUE))

测试数据大约有20万行，训练数据大约有30万行。

问题是， poly_predictions具有训练数据的维度， poly_predictions具有测试数据的维度。 因此，出了点问题。

我在这里想念什么？ 使用简单的线性模型进行预测时，例如

model_lm = lm(training$y ~ ., training)
lm_predictions = predict(model_lm, test)

我没问题

Answer 1

您过于夸大了这个问题。 由于您的模型公式使用training$x1 ，因此它是进行预测时将要查找的确切变量。 相反，请使用列共享名称的事实，然后将模型创建为

model_poly = lm(y ~ poly(x1, degree=2, raw=T) +
  poly(x2, degree=2, raw=T), data=df.training)

这将根据抽象变量x1 ， x2等产生一个模型。

然后，您可以像这样使用预测（您可以在此处省略poly调用，因为它已植入模型中）：

predict(model_poly, df.test)

产生期望的结果。 否则，你通常得到指示您的输出数据不匹配的警告newdata供给预测，如果它们是不同的长度。

R中的多元多项式回归（预测）

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-05-09 19:52:14

R中的多元多项式回归（预测）

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-05-09 19:52:14

解决方案1
1 已采纳 2018-05-09 19:52:14