简体   繁体   English

如何在 R 中正确使用预测 function

[英]How to properly use the predict function in R

First I'm going to give you some starter code:首先我要给你一些入门代码:

library(ggplot2)

y = c(0, 0, 1, 2, 0,  0, 1,  3,  0,  0,  3, 0, 6, 2, 8, 16, 21, 39, 48, 113, 92, 93 ,127, 159, 137, 46, 238, 132 ,124, 185 ,171, 250, 250 ,187, 119 ,151, 292,  94, 281, 146, 163 ,104, 156, 272, 273, 212, 210, 135, 187, 208, 310, 276 ,235, 246, 190, 232, 254, 446,
314, 402 ,276, 279, 386 ,402, 238, 581, 434, 159, 261, 356, 440, 498, 495, 462 ,306, 233, 396, 331, 418, 293 ,431 ,300, 222, 222, 479 ,501, 702
,790, 681)
x = 1:length(y)

Now, I'm trying to predict the 90th data point will be using polynomial regression, wherein the data, #1 is 0, and #89 is 681. I've tested my model and I've decided that a polynomial curve to the 8th degree is the perfect fit.现在,我试图预测第 90 个数据点将使用多项式回归,其中数据#1 为 0,#89 为 681。我已经测试了我的 model,我决定将多项式曲线8度是完美的契合。

I've tried the code predict(formula=y~poly(x,8),90) and it's giving some strange error (which doesn't make sense to me) about how there is no applicable method.我已经尝试了代码predict(formula=y~poly(x,8),90)并且它给出了一些奇怪的错误(这对我来说没有意义)关于如何没有适用的方法。

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "c('double', 'numeric')"

Why doesn't this work?为什么这不起作用? After scouring countless R documentations, blogs and forums, it seemed to me that this should work properly.在搜索了无数 R 文档、博客和论坛之后,在我看来,这应该可以正常工作。

What does work, instead?相反,什么有效? I've tried other ways of using the predict method, and I think that this is the closest solution to what I want: The predicted value for the 90th data point.我尝试过使用预测方法的其他方法,我认为这是最接近我想要的解决方案:第 90 个数据点的预测值。

Any other suggestions?还有其他建议吗? I'm not sure that my model is the best, and I would welcome any suggestions you may have.我不确定我的 model 是不是最好的,我欢迎您提出任何建议。 For example, you may argue that it's better to use a 6th degree than an 8th degree polynomial for modeling, and if you have a valid reason, I would agree with you.例如,您可能会争辩说使用 6 次多项式比 8 次多项式进行建模更好,如果您有正当理由,我会同意您的看法。

Thank you!谢谢!

NOTE: Please, PLEASE don't remove the thanks.注意:请不要删除谢谢。 I know some Stack Overflowers hate it, but I feel it gives a personal touch.我知道一些 Stack Overflowers讨厌它,但我觉得它给人一种个人风格。

predict works on models. predict适用于模型。 You have a formula, but not a model.您有一个公式,但没有 model。 You need to fit a model first, and then predict on that.您需要先安装 model,然后对其进行预测。

Usually this is done in two steps, because usually people want to save the model so it can be used for more than just a single prediction - perhaps to examine coefficients, check assumptions, get model fit diagnostics, make a different prediction - without re-fitting the model.通常这分两步完成,因为通常人们希望保存 model,因此它不仅可以用于单个预测 - 可能检查系数,检查假设,获得 model 拟合诊断,做出不同的预测 - 无需重新安装 model。

Here I'll use the simplest model that can take your formula, lm , which stands for "linear model".在这里,我将使用最简单的 model 可以采用您的公式lm ,它代表“线性模型”。 You could also use a GLM, or loess, or a random forest, a GAM, a neural net, or... many many many different models.您还可以使用 GLM、黄土、随机森林、GAM、神经网络或......许多许多不同的模型。

my_model = lm(formula=y~poly(x,8))
predict(my_model, newdata = list(x = 90))
#        1 
# 977.9421 

You could, of course, combine this into a single line, never bothering to save and name my_model :当然,您可以将其组合成一行,而不必费心保存和命名my_model

predict(lm(formula=y~poly(x,8)), newdata = list(x = 90))

I'm not sure that my model is the best,我不确定我的 model 是最好的,

It's not.它不是。 Almost certainly.几乎肯定。 But that's okay - it's very hard to know that a model is best in any sense of the word.但这没关系 - 很难知道 model 在任何意义上都是最好的。

and I would welcome any suggestions you may have.我欢迎您提出任何建议。 For example, you may argue that it's better to use a 6th degree than an 8th degree polynomial for modeling,例如,您可能会争辩说,使用 6 次多项式比 8 次多项式进行建模更好,

I don't think I've ever seen an 8th degree polynomial used.我认为我从未见过使用过 8 次多项式。 (Or even 6th.) It's absurdly high. (甚至是第 6 名。)高得离谱。 I have no idea what your data is, so I can't say much.我不知道你的数据是什么,所以我不能说太多。 If you have a reason to think that 8th degree polynomial is accurate, then go for it.如果您有理由认为 8 次多项式是准确的,那么 go 就可以了。 But if you just want to fit a wiggly curve and extrapolate forward a tiny bit, then a cubic spline using mgcv::gam or a stats::loess model would be a much more standard choice.但是,如果您只想拟合一条摆动曲线并向前推断一点点,那么使用mgcv::gamstats::loess model 的三次样条曲线将是一个更标准的选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM