简体   繁体   English

在R中,如何获得对一组数据的最佳拟合方程?

[英]In R, how do you get the best fitting equation to a set of data?

I'm not sure wether R can do this (I assume it can, but maybe that's just because I tend to assume that R can do anything :-)). 我不确定R是否可以做到这一点(我认为它可以做到,但是那可能只是因为我倾向于认为R可以做到任何事情:-))。 What I need is to find the best fitting equation to describe a dataset. 我需要的是找到描述数据集的最佳拟合方程。

For example, if you have these points: 例如,如果您有以下几点:

df = data.frame(x = c(1, 5, 10, 25, 50, 100), y = c(100, 75, 50, 40, 30, 25))

How do you get the best fitting equation? 您如何获得最佳拟合方程式? I know that you can get the best fitting curve with: 我知道您可以通过以下方式获得最佳拟合曲线:

plot(loess(df$y ~ df$x))

But as I understood you can't extract the equation, see Loess Fit and Resulting Equation . 但是据我了解,您不能提取方程式,请参阅黄土拟合和结果方程式

When I try to build it myself (note, I'm not a mathematician, so this is probably not the ideal approach :-)), I end up with smth like: 当我尝试自己构建它时(注意,我不是数学家,所以这可能不是理想的方法:-)),我最终遇到了类似的问题:

y.predicted = 12.71 + ( 95 / (( (1 + df$x) ^ .5 ) / 1.3))

Which kind of seems to approximate it - but I can't help to think that smth more elegant probably exists :-) 哪种近似的感觉-但我不禁认为可能存在更优雅的方法:-)

I have the feeling that fitting a linear or polynomial model also wouldn't work, because the formula seems different from what those models generally use (ie this one seems to need divisions, powers, etc). 我觉得拟合线性或多项式模型也不起作用,因为该公式似乎与那些模型通常使用的公式不同(即,该模型似乎需要除法,乘幂等)。 For example, the approach in Fitting polynomial model to data in R gives pretty bad approximations. 例如,将多项式模型拟合到R中的数据的方法给出了非常差的近似值。

I remember from a long time ago that there exist languages (Matlab may be one of them?) that do this kind of stuff. 我记得很久以前,存在做这种事情的语言(Matlab可能是其中的一种?)。 Can R do this as well, or am I just at the wrong place? R也可以这样做,还是我在错误的地方?

(Background info: basically, what we need to do is find an equation for determining numbers in the second column based on the numbers in the first column; but we decide the numbers ourselves. We have an idea of how we want the curve to look like, but we can adjust these numbers to an equation if we get a better fit. It's about the pricing for a product (a cheaper alternative to current expensive software for qualitative data analysis); the more 'project credits' you buy, the cheaper it should become. Rather than forcing people to buy a given number (ie 5 or 10 or 25), it would be nicer to have a formula so people can buy exactly what they need - but of course this requires a formula. We have an idea for some prices we think are ok, but now we need to translate this into an equation. (背景信息:基本上,我们要做的是找到基于第一列中的数字来确定第二列中的数字的方程;但是我们自己决定数字。我们对希望曲线的外观有所了解例如,但是如果我们更合适,我们可以将这些数字调整为一个方程式,这是关于产品的价格(用于定性数据分析的当前廉价软件的廉价替代品);您购买的“项目信用”越多,价格越便宜与其强迫人们购买一个给定的数字(即5或10或25),不如拥有一个公式使人们可以准确地购买他们所需要的东西,这会更好,但是当然这需要一个公式。我们认为可以接受某些价格的想法,但是现在我们需要将其转化为等式。

Multiple Linear Regression Example 多元线性回归示例

fit <- lm(y ~ x1 + x2 + x3, data=mydata) 适合<-lm(y〜x1 + x2 + x3,data = mydata)

summary(fit) # show results 摘要(适合)#显示结果

The code above should give you the line that best fits your data using OLS. 上面的代码应该为您提供最适合使用OLS的数据的行。

My usual plug: http://creativemachines.cornell.edu/eureqa 我通常的插件: http : //creativemachines.cornell.edu/eureqa

But as Roland said, the "best fit in general" has little meaning, since any function can be expressed as a Taylor series. 但是正如罗兰所说,“总体上最合适”的意义不大,因为任何函数都可以表示为泰勒级数。 Since a set of data is expected to have noise aka errors in its values, a big part of curve-fitting is determining what is noise and what isn't. 由于期望一组数据的值中也有噪声,也就是误差,因此曲线拟合的很大一部分是确定什么是噪声,什么不是噪声。
If you pick some fit function arbitrarily, one thing I can pretty much guarantee is that extrapolated points will diverge in a hurry. 如果您任意选择一些拟合函数,那么我可以保证的一点是,外推点将急于发散。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM