简体   繁体   English

如何通过反复试验或 R 中更好的特定替代方法将数据集拟合到特定的 function?

[英]How to fit a data set to an specific function by trial and error or a better specific alternative in R?

I have a data set and I want to adjust to the following function and find the parameters a and b:我有一个数据集,想调整到下面的function,找到参数a和b: 在此处输入图像描述

I tried the nonlinear least squares approach, however, I'd like to try by trial and error, using a vector with values for a, and another for b, then plot all the alternatives mixing this values to choose a better fit.我尝试了非线性最小二乘法,但是,我想通过反复试验来尝试,使用具有 a 值和 b 值的向量,然后 plot 所有替代方案混合这些值以选择更好的拟合。

library(readxl)
library(ggplot2)

x <- c(52.67, 46.80, 41.74, 40.45)
y <- c(1.73, 1.84, 1.79, 1.45)

df <- data.frame(x,y)

ggplot(data = df, aes(x, y))+
  geom_point()+
  stat_smooth(method="nls",
              se=FALSE,
              formula = y ~ (a*b*x)/(1+(b*x)),
              method.args = list(start = c(a=2.86, b=0.032)))

在此处输入图像描述

I wonder if you're a bit mistrustful of the output of nls , thinking that perhaps you could find a better fit yourself?我想知道您是否对nls的 output 有点不信任,认为也许您可以找到更适合自己的?

Here's a way to at least give you a better feel for the fit created by different values of a and b .这是一种至少可以让您更好地感受由ab的不同值创建的拟合的方法。 The idea is that we create a plot with all the values of a on the x axis, and all the values of b on the y axis.我们的想法是,我们创建一个 plot,其中a的所有值在 x 轴上, b的所有值在 y 轴上。 For each pair of a and b we work out how close the resulting curve would be to our data (by taking the log sum of squares).对于每一对ab ,我们计算得到的曲线与我们的数据的接近程度(通过取对数平方和)。 If the fit is good, we colour it with a bright colour, and if the fit is bad we colour it with a darker colour.如果合身好,我们用亮色上色,如果合身性不好,我们用深色上色。 This allows us to see the types of combinations that will make good fits - effectively a heat map of the parameters.这使我们能够看到适合的组合类型 - 实际上是参数的热 map。

# Our actual data, put in a data frame:
df <- data.frame(x = c(52.67, 46.80, 41.74, 40.45), y = c(1.73, 1.84, 1.79, 1.45))

# Create a grid of all a and b values we want to compare
a <- seq(-5, 10, length.out = 200)
b <- seq(0, 0.5, length.out = 100)
all_mixtures <- setNames(expand.grid(a, b), c("a", "b"))

# Get the sum of squares for each point:
all_mixtures$ss <- apply(all_mixtures, 1, function(i) {
  log(sum((i[1] * i[2] * df$x / (1 + i[2] * df$x) - y)^2))
})

Now we plot the heatmap:现在我们 plot 热图:

p <- ggplot(all_mixtures, aes(a, b, fill = ss)) +
  geom_tile() + 
  scale_fill_gradientn(colours = c("white", "yellow", "red", "blue")) 
p

在此处输入图像描述

Clearly, the optimum pair of a and b lie somewhere on the white line.显然,最佳的ab对位于白线上的某处。

Now let's see where the nls thought the best combination of a and b was:现在让我们看看nls认为ab的最佳组合在哪里:

p + geom_point(aes(x= 2.8312323, y = 0.0334379), size = 5)

在此处输入图像描述

It looks as though it has found the optimum just at the "bend" of the white line, which is probably what you have guessed.它看起来好像在白线的“弯曲”处找到了最佳值,这可能是你猜到的。

It looks like if you stray outside this white line, your fit will be worse, and you're not going to find anywhere on the white line that's better.看起来如果你偏离这条白线,你的合身性会更差,而且你不会在白线上找到任何更好的地方。

Trust the nls .相信nls Yes, the fit doesn't look very good, but that's simply because the data don't fit this particular formula very well, however you set its parameters.是的,拟合看起来不太好,但这仅仅是因为数据不能很好地拟合这个特定的公式,但是你设置了它的参数。 If your model has to be in this form, and these are your data, this is the best fit you are going to get.如果您的 model 必须采用这种形式,并且这些是您的数据,那么这是您将获得的最佳选择。

What constitutes a better bit?什么是更好的位? Mathematically speaking, the best fit is the one that optimizes a goodness-of-fit metric.从数学上讲,最佳拟合是优化拟合优度指标的拟合。 Let's obtain parameters a and b that minimize the sum of squares of deviations (the least-squares method):让我们获得最小化偏差平方和的参数ab (最小二乘法):

First, define your metric ( least_squares below):首先,定义您的指标(下面的least_squares ):

x <- c(52.67, 46.80, 41.74, 40.45)
y <- c(1.73, 1.84, 1.79, 1.45)

y_hat <- function(x, a, b){
  a*b*x/(1 + b*x)
}

least_squares <- function(par, y, x){
  sum((y - y_hat(x, par[1], par[2]))^2)
}

After this, we minimize the metric w.r.t a and b .在此之后,我们最小化度量 w.r.t ab One can use R machinery for multivariate optimization (eg, optim ) for that:为此,可以使用 R 机器进行多变量优化(例如optim ):

optim(c(2.86, 0.032), least_squares, y=y, x=x)

which gives optimal values for the parameters:它给出了参数的最佳值:

$par
[1] 2.8312323 0.0334379

Here, c(2.86, 0.032) is an initial guess for parameters' values.这里, c(2.86, 0.032)是对参数值的初始猜测。 You are free to define your own metric (for example, the sum of absolute deviations, weighted sum of least squares, etc.) according to what you need and optimize it.您可以根据需要自由定义自己的指标(例如,绝对偏差之和、最小二乘加权和等)并对其进行优化。 You can play with settings, but it is unlikely that you will arrive at a different result for the same optimization metric given how simple the example is.您可以使用设置,但考虑到示例的简单程度,您不太可能针对相同的优化指标得出不同的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM