簡體   English   中英

GAM 的一階導數的顯着變化點的測試統計是否平滑?

[英]Test statistics for significant change points from First Derivative of GAM smooth?

問題:

  1. 如何獲得重要變化點的測試統計信息? 我也不確定我的自由度應該是多少,因為derivatives()對數據進行了 200 次采樣。

  2. 我是否正確計算了同步 CI? Gavin Simpson 說我得到signifD() function 的函數只使用逐點間隔。 我希望因為我首先使用“derivatives()”來計算我在這里沒問題的同步 CI。

我正在為電捕魚 CPUE 數據建模,以描述 20 年來相對豐度趨勢的變化,這是我碩士論文的一部分。 為此,我使用來自gratia package 的derivatives() function 來計算我的gamm() model 擬合的一階導數和同步 CI。 我已經閱讀了 Gavin Simpson(和其他人)在各個網站上關於如何執行此操作的許多帖子,但是多年來似乎對所需的功能和gratia package 進行了很多更改,我想確保我正在執行所有這些計算正確。 此外,我的委員會成員希望我報告重大變化點的測試統計數據,但我不確定如何獲得這些數據。 任何幫助將非常感激。 我在底部包含了一些數據。

我正在使用gratia_0.7.3

根據數據年數為gamm()設置k (在本例中,k = 20)。

(k_uor <- length(unique(dat$year)))

Fit gamm() model electrofishing catch per unit effort (ECPUE) by year 使用Gamma()分布和通過REMLlog鏈接。

gamm_fit <- gamm((ECPUE + 1) ~ s(year, k = k_uor),
                      family = Gamma(link = "log"), 
                      data = dat,
                      method = "REML")

為 2001 年到 2020 年創建new_data

new_data <- data.frame(matrix(seq(2001, 2020, 1),
                              ncol = 1))
names(new_data) <- "year"

然后使用來自gratiaderivatives()從我的gamm() model 獲得一階導數,同時 CI 適合研究期間的平滑項s(year) ( new_data )。

fd_gamm_fit <- derivatives(gamm_fit,
                  term = "s(year)",
                  newdata = new_data,
                  order = 1L,
                  type = "backward",
                  n = 200,
                  eps = 1e-07,
                  interval = "simultaneous",
                  n_sim = 10000
)

然后,我在研究期間 ( new_data ) 從我的 model 擬合得到預測數據p2

p2 <- predict(gamm_fit$gam, newdata = new_data, type = "terms", se.fit = TRUE)

使用Gavin Simpson提供的signifD()來獲取趨勢顯着增加或減少的時間段。

signifD <- function(x, d, upper, lower, eval = 0) {
  miss <- upper > eval & lower < eval
  incr <- decr <- x
  want <- d > eval
  incr[!want | miss] <- NA
  want <- d < eval
  decr[!want | miss] <- NA
  list(incr = incr, decr = decr)
  }

m2.dsig <- signifD(p2$fit, 
                   d = fd_gamm_fit$derivative,
                   fd_gamm_fit$upper, 
                   fd_gamm_fit$lower)

which( !(is.na(m2.dsig$incr)) ) %>%
  fd_gamm_fit[.,]

which( !(is.na(m2.dsig$decr)) ) %>%
  fd_gamm_fit[.,]

ECPUE 在 2010 年和 2015 年期間顯着增加,在 2003 年和 2017 年期間減少

繪制導數和聯立置信區間。

draw(fd_gamm_fit,
     alpha = 0.2) +
  theme(axis.text.x = element_text(angle = -45)) +
  scale_x_continuous(breaks = seq(2001, 2020, 1),
                     labels = seq(2001, 2020, 1)) +
  labs(title = "Simultaneous CI first derivative") +
  geom_hline(yintercept = 0)

rm(p2,
   m2.dsig,
   new_data)

具有同步 CI 的一階導數圖

Plot GAMM 符合響應量表

預測數據

new_data_pred <- data.frame(matrix(c(seq(2001,2020,0.1)),
                                   nrow = 191, ncol = 1, byrow = FALSE)
)
names(new_data_pred) <- "year"

pred1 <- predict(gamm_fit$gam, new_data_pred, type = "response", se.fit = TRUE)

years <- seq(2001,2020,0.1)
upper_2se <- (pred1$fit-1)+(pred1$se.fit*2)
lwr_2se <- (pred1$fit-1)-(pred1$se.fit*2)

pred_2 <- cbind(years, pred1$fit-1, upper_2se, lwr_2se) %>%
  as.data.frame()
names(pred_2) <- c("years", "pred", "upr_2se", "lwr_2se")

rm(new_data_pred,
   years,
   upper_2se,
   lwr_2se,
   pred1)

創建圖表

ggplot(data = pred_2, aes(x = years, y = pred)) +
  labs(title = "") +
  scale_x_continuous(name = "Year",
                     breaks = seq(2001,2020,1),
                     labels = seq(2001,2020,1)) +
  scale_y_continuous(name = "ECPUE") +
  geom_line() + 
  geom_ribbon(aes(ymin = lwr_2se, ymax = upr_2se),
              colour = "grey",
              alpha = 0.2,
              linetype = "blank") +
  geom_line(data = pred_2 %>%
              subset(years >= 2010 &
                       years < 2011),
            aes(x = years, y = pred),
            colour = "blue") +
  geom_line(data = pred_2 %>%
              subset(years >= 2015 &
                       years < 2016),
            aes(x = years, y = pred),
            colour = "blue") +
  geom_line(data = pred_2 %>%
              subset(years >= 2003 &
                       years < 2004),
            aes(x = years, y = pred),
            colour = "red") +
  geom_line(data = pred_2 %>%
              subset(years >= 2017 &
                       years < 2018),
            aes(x = years, y = pred),
            colour = "red") +
  geom_point(data = dat_summary,
             aes(x = year, y = ECPUE))

在響應量表上繪制的顯着變化點

這是一些示例數據。

dat <- structure(list(year = c(2001, 2001, 2001, 2001, 2001, 2001, 2001, 
2001, 2001, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 
2002, 2002, 2002, 2002, 2002, 2002, 2003, 2003, 2003, 2003, 2003, 
2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2004, 2004, 
2004, 2004, 2004, 2004, 2004, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2006, 
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2007, 2007, 2007, 2007, 
2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 
2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2009, 2009, 2009, 2009, 
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2010, 2010, 2010, 2010, 
2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 
2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 
2010, 2010, 2010, 2010, 2010, 2010, 2010, 2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2012, 2012, 
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 
2013, 2013, 2013, 2013, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 
2014, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
2019, 2019, 2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020), 
    ECPUE = c(0, 0, 12, 0, 5.99999999999999, 0, 0, 24, 0, 5.99999999999999, 
    0, 12, 0, 5.99999999999999, 5.99999999999999, 12, 0, 0, 0, 
    18, 12, 5.99999999999999, 0, 35.9999999999999, 0, 0, 0, 0, 
    0, 0, 0, 5.99999999999999, 0, 4, 0, 0, 5.99999999999999, 
    5.99999999999999, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 
    0, 0, 5.99999999999999, 0, 0, 0, 0, 0, 0, 5.99999999999999, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 
    0, 4, 4, 12, 0, 8, 4, 0, 4, 0, 8, 0, 0, 0, 8, 4, 0, 0, 4, 
    0, 0, 4, 0, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 4, 
    0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 0, 
    0, 0, 0, 0, 4, 12, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 18, 12, 0, 4, 18, 0, 12, 4, 0, 0, 0, 0, 8, 0, 0, 0, 12, 
    4, 0, 0, 4, 0, 18, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 4, 
    4, 48, 32, 36, 12, 16, 12, 0, 48, 0, 8, 36, 0, 0, 32, 4, 
    48, 0, 4, 20, 8, 4, 0, 36, 0, 0, 4, 0, 20, 4, 0, 0, 0, 0, 
    4, 0, 0, 0, 0, 0, 0, 5.99999999999999, 0, 0, 5.99999999999999, 
    5.99999999999999, 0, 0, 0, 0, 5.99999999999999, 5.99999999999999, 
    0, 0, 5.99999999999999, 0, 5.99999999999999, 0, 0, 16, 0, 
    0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 0, 0, 12, 0, 12, 4, 0, 0, 
    48, 0, 0, 4, 0, 5.99999999999999, 0, 0, 0, 4, 12, 0, 12, 
    5.99999999999999, 0, 4, 16, 0, 0, 12, 4, 0, 5.99999999999999, 
    18, 16, 4, 0, 12, 16, 0, 4, 0, 0, 4, 0, 4, 12, 0, 0, 0, 0, 
    4, 0, 4, 0, 0, 0, 0, 0, 12, 0, 4, 0, 4, 0, 5.99999999999999, 
    4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 5.99999999999999, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 12, 8, 
    12, 24, 0, 0, 0, 16, 0, 5.45454545454546, 24, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 32, 24, 0, 0, 4, 0, 0, 0, 5.99999999999999, 
    65.9999999999999, 16, 65.9999999999999, 65.9999999999999, 
    0, 0, 65.9999999999999, 0, 24, 16, 0, 0, 16, 0, 5.99999999999999, 
    0, 0, 24, 0, 0, 0, 0, 4, 0, 0, 0, 8, 4, 0, 0, 8, 4, 8, 4, 
    0, 0, 0, 0, 8, 0, 4, 12, 4, 0, 0, 0, 0, 4, 0, 8, 0, 0, 12, 
    0, 0, 0, 8, 0, 0, 0, 12, 4, 0, 0, 8, 8, 8, 4, 0, 0, 0, 4, 
    8, 0, 16, 0, 8, 0, 0, 4, 0, 4, 0, 4, 0, 8, 0, 0, 0, 0, 0, 
    4, 4, 0, 16, 0, 0, 0, 0, 0, 3.95604395604395, 0, 0, 20, 0, 
    0, 4, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 4, 8, 0, 0, 0, 0, 0, 
    4, 4, 0, 4, 0, 4, 0, 0, 0, 8, 8, 0, 0, 8, 4, 8, 0, 0, 4, 
    8, 4, 0, 3, 4, 4, 4, 5.14285714285715, 0, 10.9090909090909, 
    0, 12, 0, 0, 20, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 3, 20, 0, 
    0, 5.14285714285715, 0)), class = "data.frame", row.names = c(NA, 
-644L))
dat_summary <- structure(list(year = 2001:2020, ECPUE = c(6.66666666666665, 
10.9333333333333, 0.714285714285713, 0, 0, 0.5, 0.639999999999999, 
1.26829268292683, 1.72727272727272, 3.15151515151515, 11.2, 3.71428571428571, 
4.54285714285714, 1.26923076923077, 4.96405919661733, 7.78947368421051, 
2.90909090909091, 3.0465724751439, 3.41463414634146, 3.55351545006718
)), row.names = c(NA, -20L), class = "data.frame")

這里有一些錯誤:

  1. 我會使用gam()而不是gamm()將其與tw()系列相匹配。 您有一些 0,這些在Gamma系列中是不允許的,因此您要在響應中加 1,但我看不出有任何理由需要在這里使用gamm()

  2. 更好的是,假設你的effort變量將是 model 這些實際上是用poisson()nb()系列計算的,並通過偏移項包括努力 - 添加+ offset(log(effort))到你的 model 公式,其中effort是將每單位努力捕獲的計數標准化的變量。 沒有那個變量, tw()是最好的選擇。

  3. 您正在使用來自響應量表的預測來計算置信區間( upper_2se等),這是不正確的:正如您的 plot 所示,該區間包含負值,這是沒有意義的。 相反,在預測時使用type = "link" ,然后像現在一樣形成置信區間,然后使用 link function 的倒數(在所有提到的案例 — Gammatwnb poisson — 鏈接 function 是log() ,因此您需要exp()作為逆函數。這樣您將獲得尊重數據屬性的置信區間。

  4. 我會在比你更精細的網格上計算導數——你是在 integer 年值周圍的一個非常小的區間內估計導數。

    這實際上給你帶來了一些問題,如上一個 plot 所示; 每個藍色尖峰都應與紅色下降相關聯,但由於您評估導數的粗略網格點,您根本看不到顯着下降。 最后一個紅色部分也可能是虛假的。

    如果您想對導數進行瞬時測量,您應該像 plot 那樣平滑並為更精細的網格執行此操作,例如在min(year)max(year)范圍內使用seq(min(year), max(year), length = 200)說。

    或者,如果你想要全年的導數(?),你必須將eps增加到更大的值,比如1 ,我建議你做一個中心導數,這樣你就不會在開始時外推太多/時間序列結束。

請記住,這些是鏈接(對數)尺度上的導數; 它們不是響應級別的變化。 響應導數有點復雜,但應該會在幾周內出現。

我會嘗試從您的數據中得出一個完整的示例,但我也有一些問題,我會先作為評論發布。 將其視為 WIP。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM