简体   繁体   English

如何编写R中系数趋于0的函数?

[英]How to write a function where the coefficient tends to 0 in R?

I would like to write a function that smooths the coefficient of growth rate to 0 in 60 days.我想编写一个函数,在 60 天内将增长率系数平滑为 0。 So far I managed to write the following code:到目前为止,我设法编写了以下代码:

corona <- data.frame(Cases = c(3, 16, 79, 157, 229, 322, 400, 650, 888, 1128, 1694, 2036, 2502, 3089, 3858), Date = seq(as.Date("2020/02/20"), as.Date("2020/03/05"), by = "days"))

library(dplyr)
corona_entire <- corona %>% mutate(Growth = (Cases - lag(Cases))/lag(Cases)*100)  

mean(corona_entire$Growth[12:15])

ff = function(x) x*(1.2285823)^60

ff(3858)

However, in my function the growth rate (0.2285823) is constant over 60 periods.但是,在我的函数中,增长率 (0.2285823) 在 60 个周期内保持不变。 I would like to tell R to make that growth rate tend to 0 as we get closer and closer to 60. I need to write a convergence function for the growth rate basically.我想告诉 R 随着我们越来越接近 60,使增长率趋于 0。我需要为增长率编写一个收敛函数。

Any idea how can I code it?知道如何编码吗?

Thanks!谢谢!

Further to my comment above, it's not clear to me what you're trying to do.除了我上面的评论之外,我不清楚你想要做什么。 If you want to model the Growth rate you need to fit some form of model.如果要对Growth建模,则需要拟合某种形式的模型。

For a start, how about an exponential model of the form y = y0 * exp(k * time) ?首先, y = y0 * exp(k * time)形式的指数模型怎么样?

In that case we can linearise the model (and data) by taking the log, and then use lm to estimate the model coefficients log(y0) and k .在这种情况下,我们可以通过取对数来线性化模型(和数据),然后使用lm来估计模型系数log(y0)k

df <- corona_entire %>% mutate(Time = as.integer(Date - min(Date)))
fit <- lm(log(Growth) ~ Time, weights = df$Growth, data = df)

Here I have used a weighted least squares approach by weighting every point by its Growth rate.在这里,我使用了加权最小二乘法,通过按其Growth每个点进行加权。

We can then plot the points and best fit curve:然后我们可以绘制点和最佳拟合曲线:

f <- function(x, fit) exp(coef(fit)[1])*exp(coef(fit)[2] * x)
ggplot(df, aes(Time, Growth)) +
    geom_point() +
    stat_function(fun = f, args = list(fit = fit)) +
    labs(x = sprintf("Days since %s", min(df$Date)))

在此处输入图片说明

Not a good fit but this should give you some ideas.不太合适,但这应该会给你一些想法。 You probably want to fit a more suitable non-linear growth-rate model, and estimate parameters using nls .您可能想要拟合更合适的非线性增长率模型,并使用nls估计参数。


Update更新

Since you really want to predict Cases , let's re-formulate our model.既然你真的想预测Cases ,让我们重新制定我们的模型。

We start again with an exponential model of the form Cases ~ y0 * exp(k * Time)我们再次从Cases ~ y0 * exp(k * Time)形式的指数模型开始

ggplot(df, aes(Time, Cases)) +
    geom_point()
fit1 <- lm(log(Cases) ~ Time, data = df)
f1 <- function(x, fit) exp(coef(fit)[1])*exp(coef(fit)[2] * x)
ggplot(df, aes(Time, Cases)) +
    geom_point() +
    stat_function(fun = f1, args = list(fit = fit1)) +
    labs(x = sprintf("Days since %s", min(df$Date)))

在此处输入图片说明

Not a good fit!不太合适! Results seem to suggest sub-exponential growth.结果似乎表明次指数增长。 A simple model for sub-exponential growth in epidemiology is a model of the form Cases ~ (r / m * Time + A)^m , see eg Chowell et al., Phys.流行病学中次指数增长的简单模型是Cases ~ (r / m * Time + A)^m形式的模型,参见Chowell 等人,Phys. Life Rev. 18, 66 (2016) . Life Rev. 18, 66 (2016)

So let's fit the model, this time using the non-linear least-squares routine nls .所以让我们拟合模型,这次使用非线性最小二乘例程nls

fit2 <- nls(
    Cases ~ (r / m * Time + A)^m,
    data = df,
    start = list(r = 4, m = 3, A = 1))
f2 <- function(x, r, m, A) (r / m * x + A)^m
ggplot(df, aes(Time, Cases)) +
    geom_point() +
    stat_function(
        fun = f2, 
        args = list(
            r = coef(fit2)[1],
            m = coef(fit2)[2],
            A = coef(fit2)[3])) +
    labs(x = sprintf("Days since %s", min(df$Date)))

在此处输入图片说明

Looks like a decent fit.看起来很合身。 You can inspect the quality of the fit and the non-linear least-squares estimates for the coefficients if you type summary(fit2)如果您键入summary(fit2)您可以检查拟合的质量和系数的非线性最小二乘估计

summary(fit2)
#
#Formula: Cases ~ (r/m * Time + A)^m
#
#Parameters:
#  Estimate Std. Error t value Pr(>|t|)
#r   2.3308     0.6543   3.562  0.00391 **
#m   3.3316     0.4202   7.929 4.12e-06 ***
#A   2.1101     0.3126   6.750 2.04e-05 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 51.41 on 12 degrees of freedom
#
#Number of iterations to convergence: 6
#Achieved convergence tolerance: 6.514e-07
#

If you just want a linear fall in growth rate towards 1 over 60 days you can do this:如果您只想在 60 天内将增长率线性下降到 1,您可以这样做:

ff = function(initial_n, initial_rate = 1.2285823, days = 60, time_to_stasis = 60)
{
  daily_rate <- seq(initial_rate, 1, length.out = time_to_stasis)
  result <- numeric(days)
  result[1] <- initial_n
  for(i in seq(days - 1)) result[i + 1] <- floor(daily_rate[i] * result[i])
  return(result)
}

So you get a daily number like this:所以你会得到一个这样的每日数字:

ff(3858)
#>  [1]    3858    4739    5803    7084    8620   10456   12643   15239   18309   21926
#> [11]   26173   31141   36932   43656   51436   60403   70699   82477   95897  111129
#> [21]  128350  147743  169494  193790  220818  250760  283791  320073  359754  402961
#> [31]  449796  500332  554607  612621  674331  739644  808418  880454  955498 1033237
#> [41] 1113297 1195248 1278600 1362812 1447290 1531398 1614460 1695773 1774611 1850239
#> [51] 1921922 1988936 2050581 2106192 2155151 2196899 2230944 2256873 2274360 2283171

and you can adjust the parameters to whatever you like.您可以根据自己的喜好调整参数。

You could use it to plot projections like this:你可以用它来绘制这样的投影:

plot(1:60, ff(3858))

在此处输入图片说明

I'm not sure how biologically plausible this is though.不过,我不确定这在生物学上是否合理。

Looking at the data, it looks like a quadratic curve is the better option to model Cases as a function of days查看数据,看起来二次曲线是将Cases建模为days的函数的更好选择

corona$days = as.numeric(corona$Date - corona$Date[1], "days") + 1
mod = lm(Cases ~ poly(days, 2, raw = TRUE), corona)
summary(mod)

#Call:
#lm(formula = Cases ~ poly(days, 2, raw = TRUE), data = corona)

#Residuals:
#    Min      1Q  Median      3Q     Max 
#-140.48  -50.63  -24.30   65.89  148.04 

#Coefficients:
#                           Estimate Std. Error t value Pr(>|t|)    
#(Intercept)                 264.912     84.071   3.151  0.00836 ** 
#poly(days, 2, raw = TRUE)1 -158.269     24.179  -6.546 2.75e-05 ***
#poly(days, 2, raw = TRUE)2   25.863      1.469  17.600 6.17e-10 ***
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 94.38 on 12 degrees of freedom
#Multiple R-squared:  0.9949,   Adjusted R-squared:  0.9941 
#F-statistic:  1181 on 2 and 12 DF,  p-value: 1.668e-14


plot(corona$days, corona$Cases)
lines(predict(mod, data.frame(days = corona$days)))

# Growth Rate 
d = predict(mod, data.frame(days = 59:60))
diff(d)/d[1]
#         2 
#0.03606188 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM