简体   繁体   中英

Using lm(), nls() (and glm()?) to estimate population growth rate in Malthusian growth model

My question is related to estimating the population growth rate in Malthusian growth model . As a toy example, consider a toy dataset df :

structure(list(x= c(0L, 24L, 48L, 72L, 96L, 120L, 144L, 168L
), y = c(10000, 18744.0760659189, 35134.0387564953, 65855.509495469, 
123440.067934292, 231377.002294256, 433694.813090781, 812920.856596808
)), .Names = c("x", "y"), row.names = c(NA, -8L), class = "data.frame")

I am trying to fit this dataset by exponential model :

y = 10000 * (e^(r * x))

and estimate r . When using nonlinear regression nls() :

fit <- nls(y ~ (10000 * exp(r*x)), data=df)

I get the following error:

Error in getInitial.default(func, data, mCall = as.list(match.call(func,  : 
  no 'getInitial' method found for "function" objects

I also tried lm()

fit <- lm(log(y) ~ (10000 * exp(r*x)), data=df) 

but get

Error in terms.formula(formula, data = data) : 
  invalid model formula in ExtractVars

How can I solve this? How can I fit the data to the exponential model I have?

Also, are there other approaches I could consider for fitting population growth model? Is glm() reasonable?

Using lm()

Please read ?formula for correct specification of a formula. Now I will proceed assuming you have read that.

First, your model, after taking log transform on both LHS and RHS, becomes:

log(y) = log(10000) + r * x

The constant is a known value, not to be estimated. Such constant is called offset in lm .

You should use lm as this:

# "-1" in the formula will drop intercept
fit <- lm(log(y) ~ x - 1, data = df, offset = rep(log(10000), nrow(df)))

# Call:
#  lm(formula = log(y) ~ x - 1, data = df, offset = rep(log(10000), nrow(df)))

#  Coefficients:
#        x  
#  0.02618  

As you've spotted, fit is a list of length 13. See the "Value" section of ?lm and you will get better idea of what they are. Among those, the fitted values are $fitted , so you can draw your plot by:

plot(df)
lines(df$x, exp(fit$fitted), col = 2, lwd = 2)  ## red line

适合

Pay attention to my using exp(fit$fitted) , because we fit a model for log(y) and now we are going back to original scale.

Remark

As @BenBolker said, a simpler specification is:

fit <- lm(log(y/10000) ~ x - 1, data = df)

or

fit <- lm(log(y) - log(10000) ~ x - 1, data = df)

But the response variable is not log(y) but log(y/10000) now, so when you make plot, you need:

lines(df$x, 10000 * exp(fit$fitted), col = 2, lwd = 2)

Using nls()

Correct way for using nls() is as this:

nls(y ~ 10000 * exp(r * x), data = df, start = list(r = 0.1))

Because non-linear curve fitting requires iterations, a starting value is needed, and must be provided via argument start .

Now, if you try this code, you will get:

Error in nls(y ~ 10000 * exp(r * x), data = df, start = list(r = 0.1)) : 
  number of iterations exceeded maximum of 50

The problem is because your data are exact, without noise. Have a read on ?nls :

Warning:

     *Do not use ‘nls’ on artificial "zero-residual" data.*

So, using nls() for your toy data set df does not work.

Let's go back to check the fitted model from lm() :

fit$residuals
#            1             2             3             4             5 
#-2.793991e-16 -1.145239e-16 -2.005405e-15 -5.498411e-16  3.094618e-15 
#            6             7             8 
# 1.410007e-15 -1.099682e-15 -1.007937e-15

Residuals are basically 0 everywhere, and lm() does an exact fit in this case.


Follow-up

One last thing that I haven't been able to figure out is why the parameter r is not used in lm 's formula specification.

There are actually some difference in the formula between lm and nls . Perhaps you can take it as such:

  • lm() 's formula is called model formula, which you can read from ?formula . It is so fundamental in R. Model fitting routines use it, like lm , glm , while many functions have formula method, like model.matrix , aggregate , boxplot , etc.
  • nls() 's formula is more like a function specification, and really not widely used. Many other functions doing non-linear iterations like optim will not accept a formula but takes a function directly. So, just treat nls() as a special case.

So would it make sense to do it using the linear model? Simply what I am trying to model here is using Malthusian growth model.

Strictly speaking, giving real population data (certainly with noise), using nls() for curve fitting, or using glm(, family = poisson) for a Poisson response GLM has better ground than fitting a linear model. The glm() call to your data would be:

glm(y ~ x - 1, family = poisson(), data = df, offset = rep(log(10000), nrow(df)))

(You possibly need to learn what a GLM is first.) But since your data have no noise, you will get warning message when using it.

However, in terms of computational complexity, using a linear model by first taking log transform is a clear win. In statistical modelling, variable transform are very common , so there is no compelling reason to reject the use of linear model for estimation of population growth rate.

As a complete picture, I recommend you try all three approaches for real data (or noisy toy data). There will be some difference in estimation and prediction, but unlikely to be very great.

"Follow-follow-up"

Haha, thanks to @Ben again. For glm() , we can also try:

glm(y ~ x - 1 + offset(log(10000)), family = gaussian(link="log"))

For offset specification, we can either use offset argument in lm / glm , or the offset() function as Ben does.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM