简体   繁体   中英

How to fit and plot exponential decay function using ggplot2 and linear approximation

I am trying to fit exponential decay functions on data which has only few time points. I would like to use the exponential decay equation y = y0*e^(-r*time) in order to compare r (or eventually half-life) between datasets and factors. I have understood that using a linear fit instead of nls is a better alternative for this particular function [ 1 , 2 ], if I want to estimate the confidence intervals (which I do).

Copy this to get some example data:

x <- structure(list(Factor = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 
1L, 3L, 3L, 3L, 2L, 2L, 4L, 4L, 4L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 3L, 3L, 
3L, 3L, 3L, 1L, 1L, 1L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"), 
time = c(0.25, 0.26, 0.26, 0.26, 0.27, 0.29, 0.29, 0.33, 
0.38, 0.38, 0.38, 0.39, 0.4, 0.4, 0.41, 0.45, 0.45, 0.45, 
0.45, 0.47, 0.51, 0.51, 0.52, 0.57, 0.57, 0.57, 0.57, 0.58, 
    0.58, 0.58, 0.6, 0.6, 0.6, 0.61, 0.61, 0.61, 0.62, 0.62, 
    0.64, 0.64, 0.67, 0.67, 0.67, 0.67, 0.69, 0.7, 0.7, 0.71, 
    0.76, 0.76, 0.77, 0.77, 0.79, 0.79, 0.8, 0.8, 0.83, 0.83, 
    0.84, 0.84, 0.86, 0.86, 0.87, 0.87, 18.57, 18.57, 18.57, 
    18.58, 18.69, 18.69, 18.7, 18.7, 18.7, 18.71, 18.71, 18.71, 
    18.74, 18.74, 18.74, 18.79, 18.85, 18.85, 18.86, 18.88, 18.89, 
    18.89, 18.89, 18.93, 18.93, 18.95, 18.95, 18.95, 18.96, 18.96, 
    18.96, 20.57, 20.57, 20.61, 20.62, 20.66, 20.67, 20.67, 20.67, 
    20.72, 20.72, 20.72, 21.18, 21.19, 21.19, 21.19, 21.22, 21.22, 
    21.22, 21.23, 21.25, 21.25, 21.25, 21.25, 87.58, 87.58, 87.64, 
    87.64, 87.65, 87.84, 87.85, 87.91, 87.91, 87.91, 89.27, 89.28, 
    89.28, 89.36, 89.36, 89.4, 89.4, 110.91, 112.19, 112.19, 
    112.2, 112.2, 112.24, 112.25, 112.25, 112.26, 185.6, 185.6, 
    185.63, 185.63, 185.64, 213, 234.96, 234.97, 234.97, 234.98, 
    235.01, 235.01, 235.02, 235.02), y = c(58.1, 42.9, 54.2, 
    45.3, 51.2, 44.4, 56.9, 53.4, 61.3, 49.3, 54.4, 55.6, 25.6, 
    48.1, 50.8, 54.7, 41.8, 46.2, 39.5, 51.7, 37.7, 43.1, 44.6, 
    48.4, 50.9, 62.5, 58.6, 47.8, 44.3, 55.6, 44.9, 49.1, 49.1, 
    60.3, 40.8, 57.6, 42.9, 60, 49.4, 54.1, 37.8, 46.5, 59, 64.3, 
    48, 54.3, 51.7, 59, 57.1, 29.4, 49.2, 50, 41.3, 40.5, 43.4, 
    48.6, 38.5, 35.7, 43.6, 60, 32, 27.3, 34.3, 44.4, 36.5, 25.4, 
    22.6, 25.5, 24.1, 18.9, 25, 5.9, 19.6, 15.7, 32.3, 14.3, 
    23.4, 29.4, 17, 18.3, 34.4, 26.4, 35.7, 22.6, 23.5, 19.3, 
    25.5, 34.7, 45.5, 38.1, 33.8, 47.9, 32.3, 32.1, 43, 27.8, 
    33.3, 25.5, 22.2, 29.2, 24.2, 22.8, 19.2, 31.6, 20.8, 26.4, 
    35.8, 50, 10.7, 24, 54.3, 67, 77.7, 51.7, 64.8, 49.3, 57.8, 
    43.2, 17, 17.4, 36.4, 60.2, 36, 4, 0, 0, 9.1, 2.9, 24.3, 
    18.8, 36, 16.3, 18.4, 17.1, 26.5, 29.3, 17.4, 23.1, 25.7, 
    32.7, 16.3, 14.6, 13.7, 16.2, 16.7, 21.9, 0, 0, 11.6, 8.6, 
    0, 3.7, 3.6, 5, 3.2, 0, 2.5, 5.7)), .Names = c("Factor", 
"time", "y"), row.names = c(NA, -158L), class = "data.frame")

I manage to do this using the standard logarithmic function log(y) = x (thanks to this example ), but fail when trying to fit several parameters in linear space.

summary(lm(log(y) ~ time, data = x, subset = Factor)) # I need the summary statistics to compare models
ggplot(x, aes(x = time, y = y, color = Factor)) + geom_point() + geom_smooth(method = "glm", family = gaussian(lin="log"), start=c(5,0))

在此输入图像描述

Here is what I have tried:

## Summary

log.dec.fun <- function(N, r, time) -r*time + log(N) # The function in linear format

summary(glm(y ~ log.dec.fun(N, r, time), data = x, subset = Factor, start = c(5,0)))
# Error in log.dec.fun(N, r, time) : object 'r' not found

predict(glm(y ~ log.dec.fun(N, r, time), data = x, start = c(5,0)))
# Error in log.dec.fun(N, r, time) : object 'r' not found

## Plot

ggplot(x, aes(x = time, y = y, color = Factor)) + geom_point() + geom_smooth(method = "glm", formula = y ~ log.dec.fun(N, r, time), start = c(5,0))
#Error in log.dec.fun(N, r, time) : object 'r' not found
#Error in if (nrow(layer_data) == 0) return() : argument is of length zero

I can manage to get quite satisfactory models using nls , but I have learned that calculating confidence intervals for nls functions verges upon magic and beginners should not even try doing that.

dec.fun <- function(N, r, time) N*exp(-r*time) ## The function in non-linear form
g <- c()
for(i in 1:nlevels(x$Factor)){
z <- subset(x, Factor == levels(x$Factor)[i])
g <- append(g, predict(nls(y ~ dec.fun(N, r, time), data = z, start = list(N = 5, r = 0))))}
x <- x[with(x, order(Factor, time)),]

x$modelled <- g

ggplot(x, aes(x = time, color = Factor)) + geom_point(aes(y = y)) + geom_line(aes(y = modelled))

在此输入图像描述 So my question is how to fit exponential decay functions using R, ggplot2 and linear approximation? There is an answer in SO , where @Joe Kington indicates that this is possible and provides the Python code. Unfortunately I do not understand Python.

I believe you simply need to allow for separate slopes and intercepts to be fit by your grouping variable Factor when you fit the model with the natural logarithm transformation for the response. I call this a separate lines model. Then you can predict and get confidence (or prediction) intervals on the log scale for each Factor , and back-transform to see the lines (much like the graphs in your original post from ggplot2 .

Example of a separate lines model in R:

fit1 = lm(y ~ time*Factor, data = x)
summary(fit1)

The output of this model will show the estimated intercept for the reference level of Factor , the estimated slope for the reference level, and the difference in intercepts and slopes between the reference level and all other levels.

Alternatively, you could code the separate lines model:

fit2 = lm(y ~ time + time:Factor - 1, data = x)
summary(fit2)

This will show you the estimated intercept and slope separately for each level of Factor in your output.

To make lines based on the model, you can use predict and then back-transform to the original scale. Assuming a natural log transformation (and adding the values to your original dataset):

(x$pred = exp(predict(fit1)) )

You can also calculate and exponentiate your confidence intervals to the original scale if that's what you need.

exp(predict(fit1, interval = "confidence"))

Organizationally, you may want to put these as columns in your original dataset, as well, which you could do a variety of ways. The simplest may be to simply cbind them to the dataset x .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM