简体   繁体   中英

Test statistics for significant change points from First Derivative of GAM smooth?

Questions:

  1. How can I get test statistics for significant change points? I'm also unsure what my degrees of freedom should be since derivatives() is sampling data 200 times.

  2. Am I correctly calculating the simultaneous CI? Gavin Simpson says that the functions where I got the signifD() function only use point wise intervals. I'm hoping that since I used 'derivatives()' first to calculate simultaneous CI that I'm ok here.

I am modeling electrofishing CPUE data to describe changes in trends of relative abundance over 20 years for a part of my master's thesis. To do this I am using the derivatives() function from the gratia package to calculate the first order derivative and simultaneous CI from my gamm() model fit. I have read many posts on various sites by Gavin Simpson (and others) about how to do this however there seems to have been a lot of changes over the years to the required functions and gratia package and I want to make sure that I am performing all these calculations correctly. Additionally, my committee members want me to report test statistics for significant change points and I'm not sure how to get these. Any help would be much appreciated. I included some data at the bottom.

I'm using gratia_0.7.3

Set k for gamm() based on number of years of data (in this case, k = 20).

(k_uor <- length(unique(dat$year)))

Fit gamm() model electrofishing catch per unit effort (ECPUE) by year using Gamma() distribution and log link via REML .

gamm_fit <- gamm((ECPUE + 1) ~ s(year, k = k_uor),
                      family = Gamma(link = "log"), 
                      data = dat,
                      method = "REML")

Create new_data for the years 2001 to 2020

new_data <- data.frame(matrix(seq(2001, 2020, 1),
                              ncol = 1))
names(new_data) <- "year"

Then use derivatives() from gratia to get the first order derivative with simultaneous CI from my gamm() model fit for the smooth term s(year) over the study period ( new_data ).

fd_gamm_fit <- derivatives(gamm_fit,
                  term = "s(year)",
                  newdata = new_data,
                  order = 1L,
                  type = "backward",
                  n = 200,
                  eps = 1e-07,
                  interval = "simultaneous",
                  n_sim = 10000
)

I then get predicted data p2 from my model fit during the study period ( new_data ).

p2 <- predict(gamm_fit$gam, newdata = new_data, type = "terms", se.fit = TRUE)

Use signifD() provided from Gavin Simpson to get the periods where the trend significantly increases or decreases.

signifD <- function(x, d, upper, lower, eval = 0) {
  miss <- upper > eval & lower < eval
  incr <- decr <- x
  want <- d > eval
  incr[!want | miss] <- NA
  want <- d < eval
  decr[!want | miss] <- NA
  list(incr = incr, decr = decr)
  }

m2.dsig <- signifD(p2$fit, 
                   d = fd_gamm_fit$derivative,
                   fd_gamm_fit$upper, 
                   fd_gamm_fit$lower)

which( !(is.na(m2.dsig$incr)) ) %>%
  fd_gamm_fit[.,]

which( !(is.na(m2.dsig$decr)) ) %>%
  fd_gamm_fit[.,]

ECPUE is significantly increasing during 2010 and 2015 and decreasing during 2003 and 2017

Plotting the derivative and simultaneous Confidence intervals.

draw(fd_gamm_fit,
     alpha = 0.2) +
  theme(axis.text.x = element_text(angle = -45)) +
  scale_x_continuous(breaks = seq(2001, 2020, 1),
                     labels = seq(2001, 2020, 1)) +
  labs(title = "Simultaneous CI first derivative") +
  geom_hline(yintercept = 0)

rm(p2,
   m2.dsig,
   new_data)

具有同步 CI 的一阶导数图

Plot GAMM fit on response scale

Predict data

new_data_pred <- data.frame(matrix(c(seq(2001,2020,0.1)),
                                   nrow = 191, ncol = 1, byrow = FALSE)
)
names(new_data_pred) <- "year"

pred1 <- predict(gamm_fit$gam, new_data_pred, type = "response", se.fit = TRUE)

years <- seq(2001,2020,0.1)
upper_2se <- (pred1$fit-1)+(pred1$se.fit*2)
lwr_2se <- (pred1$fit-1)-(pred1$se.fit*2)

pred_2 <- cbind(years, pred1$fit-1, upper_2se, lwr_2se) %>%
  as.data.frame()
names(pred_2) <- c("years", "pred", "upr_2se", "lwr_2se")

rm(new_data_pred,
   years,
   upper_2se,
   lwr_2se,
   pred1)

Create ggplot

ggplot(data = pred_2, aes(x = years, y = pred)) +
  labs(title = "") +
  scale_x_continuous(name = "Year",
                     breaks = seq(2001,2020,1),
                     labels = seq(2001,2020,1)) +
  scale_y_continuous(name = "ECPUE") +
  geom_line() + 
  geom_ribbon(aes(ymin = lwr_2se, ymax = upr_2se),
              colour = "grey",
              alpha = 0.2,
              linetype = "blank") +
  geom_line(data = pred_2 %>%
              subset(years >= 2010 &
                       years < 2011),
            aes(x = years, y = pred),
            colour = "blue") +
  geom_line(data = pred_2 %>%
              subset(years >= 2015 &
                       years < 2016),
            aes(x = years, y = pred),
            colour = "blue") +
  geom_line(data = pred_2 %>%
              subset(years >= 2003 &
                       years < 2004),
            aes(x = years, y = pred),
            colour = "red") +
  geom_line(data = pred_2 %>%
              subset(years >= 2017 &
                       years < 2018),
            aes(x = years, y = pred),
            colour = "red") +
  geom_point(data = dat_summary,
             aes(x = year, y = ECPUE))

在响应量表上绘制的显着变化点

Here is some sample data.

dat <- structure(list(year = c(2001, 2001, 2001, 2001, 2001, 2001, 2001, 
2001, 2001, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 
2002, 2002, 2002, 2002, 2002, 2002, 2003, 2003, 2003, 2003, 2003, 
2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2004, 2004, 
2004, 2004, 2004, 2004, 2004, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2006, 
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2007, 2007, 2007, 2007, 
2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 
2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2009, 2009, 2009, 2009, 
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2010, 2010, 2010, 2010, 
2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 
2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 
2010, 2010, 2010, 2010, 2010, 2010, 2010, 2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2012, 2012, 
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
2019, 2019, 2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020), 
    ECPUE = c(0, 0, 12, 0, 5.99999999999999, 0, 0, 24, 0, 5.99999999999999, 
    0, 12, 0, 5.99999999999999, 5.99999999999999, 12, 0, 0, 0, 
    18, 12, 5.99999999999999, 0, 35.9999999999999, 0, 0, 0, 0, 
    0, 0, 0, 5.99999999999999, 0, 4, 0, 0, 5.99999999999999, 
    5.99999999999999, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 
    0, 0, 5.99999999999999, 0, 0, 0, 0, 0, 0, 5.99999999999999, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 
    0, 4, 4, 12, 0, 8, 4, 0, 4, 0, 8, 0, 0, 0, 8, 4, 0, 0, 4, 
    0, 0, 4, 0, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 4, 
    0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 0, 
    0, 0, 0, 0, 4, 12, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 18, 12, 0, 4, 18, 0, 12, 4, 0, 0, 0, 0, 8, 0, 0, 0, 12, 
    4, 0, 0, 4, 0, 18, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 4, 
    4, 48, 32, 36, 12, 16, 12, 0, 48, 0, 8, 36, 0, 0, 32, 4, 
    48, 0, 4, 20, 8, 4, 0, 36, 0, 0, 4, 0, 20, 4, 0, 0, 0, 0, 
    4, 0, 0, 0, 0, 0, 0, 5.99999999999999, 0, 0, 5.99999999999999, 
    5.99999999999999, 0, 0, 0, 0, 5.99999999999999, 5.99999999999999, 
    0, 0, 5.99999999999999, 0, 5.99999999999999, 0, 0, 16, 0, 
    0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 0, 0, 12, 0, 12, 4, 0, 0, 
    48, 0, 0, 4, 0, 5.99999999999999, 0, 0, 0, 4, 12, 0, 12, 
    5.99999999999999, 0, 4, 16, 0, 0, 12, 4, 0, 5.99999999999999, 
    18, 16, 4, 0, 12, 16, 0, 4, 0, 0, 4, 0, 4, 12, 0, 0, 0, 0, 
    4, 0, 4, 0, 0, 0, 0, 0, 12, 0, 4, 0, 4, 0, 5.99999999999999, 
    4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 5.99999999999999, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 12, 8, 
    12, 24, 0, 0, 0, 16, 0, 5.45454545454546, 24, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 32, 24, 0, 0, 4, 0, 0, 0, 5.99999999999999, 
    65.9999999999999, 16, 65.9999999999999, 65.9999999999999, 
    0, 0, 65.9999999999999, 0, 24, 16, 0, 0, 16, 0, 5.99999999999999, 
    0, 0, 24, 0, 0, 0, 0, 4, 0, 0, 0, 8, 4, 0, 0, 8, 4, 8, 4, 
    0, 0, 0, 0, 8, 0, 4, 12, 4, 0, 0, 0, 0, 4, 0, 8, 0, 0, 12, 
    0, 0, 0, 8, 0, 0, 0, 12, 4, 0, 0, 8, 8, 8, 4, 0, 0, 0, 4, 
    8, 0, 16, 0, 8, 0, 0, 4, 0, 4, 0, 4, 0, 8, 0, 0, 0, 0, 0, 
    4, 4, 0, 16, 0, 0, 0, 0, 0, 3.95604395604395, 0, 0, 20, 0, 
    0, 4, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 4, 8, 0, 0, 0, 0, 0, 
    4, 4, 0, 4, 0, 4, 0, 0, 0, 8, 8, 0, 0, 8, 4, 8, 0, 0, 4, 
    8, 4, 0, 3, 4, 4, 4, 5.14285714285715, 0, 10.9090909090909, 
    0, 12, 0, 0, 20, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 3, 20, 0, 
    0, 5.14285714285715, 0)), class = "data.frame", row.names = c(NA, 
-644L))
dat_summary <- structure(list(year = 2001:2020, ECPUE = c(6.66666666666665, 
10.9333333333333, 0.714285714285713, 0, 0, 0.5, 0.639999999999999, 
1.26829268292683, 1.72727272727272, 3.15151515151515, 11.2, 3.71428571428571, 
4.54285714285714, 1.26923076923077, 4.96405919661733, 7.78947368421051, 
2.90909090909091, 3.0465724751439, 3.41463414634146, 3.55351545006718
)), row.names = c(NA, -20L), class = "data.frame")

There are a few things wrong here:

  1. I'd fit this with the tw() family using gam() not gamm() . You have some 0s, and these aren't allowed in the Gamma family so you are adding 1 to your response, but I don't see any reason why you need to use gamm() here.

  2. Better, assuming you have the effort variable would be to model these as actually counts with poisson() or nb() families and include the effort via an offset term - add + offset(log(effort)) to your model formula, where effort is the variable that normalises the counts to catch per unit effort. Absent that variable, the tw() is the best option.

  3. You are computing the confidence interval ( upper_2se etc) using predictions from the response scale and that is incorrect: as your plot shows, the interval contains negative values which just doesn't make sense. instead, use type = "link" when predicting, and then form the confidence interval as you do it now, but then backtransform the upper and lower limits and the fitted values to the response scale using the inverse of the link function (in all the cases mentioned — Gamma , tw , nb poisson — the link function is log() , so you'd want exp() for the inverse. That way you'll get a confidence interval that respects the properties of the data.

  4. I'd compute the derivative over a much finer grid than you do — you are estimating the derivative over a very small interval around the integer year values.

    This is actually causing you some problems as shown in the last plot; each of the blue spikes should be associated with a red decrease, but because of the crude grid of points you are evaluating the derivative at, you simply aren't seeing the significant decreases. That last red section is also likely spurious.

    If you want an instantaneous measure of the derivative you should do like you do when you plot the smooth and do it for a finer grid, say 100-200 points over the range min(year) to max(year) with seq(min(year), max(year), length = 200) say.

    Or, if you want the derivative over the entire year (?) you would have to increase eps to be some much larger value, like 1 , and I would suggest you do a central derivative so you aren't extrapolating too much beyond the start/end of the time series.

Remember these are derivatives on the link (log) scale; they aren't response level changes. Response derivatives is a bit more involved but should be in {gratia} in a few weeks.

I'll try to work up a full example from your data, but I also have some questions which I'll post as comments first. Consider this a WIP.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM