[英]Test statistics for significant change points from First Derivative of GAM smooth?
問題:
如何獲得重要變化點的測試統計信息? 我也不確定我的自由度應該是多少,因為derivatives()
對數據進行了 200 次采樣。
我是否正確計算了同步 CI? Gavin Simpson 說我得到signifD()
function 的函數只使用逐點間隔。 我希望因為我首先使用“derivatives()”來計算我在這里沒問題的同步 CI。
我正在為電捕魚 CPUE 數據建模,以描述 20 年來相對豐度趨勢的變化,這是我碩士論文的一部分。 為此,我使用來自gratia
package 的derivatives()
function 來計算我的gamm()
model 擬合的一階導數和同步 CI。 我已經閱讀了 Gavin Simpson(和其他人)在各個網站上關於如何執行此操作的許多帖子,但是多年來似乎對所需的功能和gratia
package 進行了很多更改,我想確保我正在執行所有這些計算正確。 此外,我的委員會成員希望我報告重大變化點的測試統計數據,但我不確定如何獲得這些數據。 任何幫助將非常感激。 我在底部包含了一些數據。
我正在使用gratia_0.7.3
根據數據年數為gamm()
設置k
(在本例中,k = 20)。
(k_uor <- length(unique(dat$year)))
Fit gamm()
model electrofishing catch per unit effort (ECPUE) by year 使用Gamma()
分布和通過REML
的log
鏈接。
gamm_fit <- gamm((ECPUE + 1) ~ s(year, k = k_uor),
family = Gamma(link = "log"),
data = dat,
method = "REML")
為 2001 年到 2020 年創建new_data
new_data <- data.frame(matrix(seq(2001, 2020, 1),
ncol = 1))
names(new_data) <- "year"
然后使用來自gratia
的derivatives()
從我的gamm()
model 獲得一階導數,同時 CI 適合研究期間的平滑項s(year)
( new_data
)。
fd_gamm_fit <- derivatives(gamm_fit,
term = "s(year)",
newdata = new_data,
order = 1L,
type = "backward",
n = 200,
eps = 1e-07,
interval = "simultaneous",
n_sim = 10000
)
然后,我在研究期間 ( new_data
) 從我的 model 擬合得到預測數據p2
。
p2 <- predict(gamm_fit$gam, newdata = new_data, type = "terms", se.fit = TRUE)
使用Gavin Simpson提供的signifD()
來獲取趨勢顯着增加或減少的時間段。
signifD <- function(x, d, upper, lower, eval = 0) {
miss <- upper > eval & lower < eval
incr <- decr <- x
want <- d > eval
incr[!want | miss] <- NA
want <- d < eval
decr[!want | miss] <- NA
list(incr = incr, decr = decr)
}
m2.dsig <- signifD(p2$fit,
d = fd_gamm_fit$derivative,
fd_gamm_fit$upper,
fd_gamm_fit$lower)
which( !(is.na(m2.dsig$incr)) ) %>%
fd_gamm_fit[.,]
which( !(is.na(m2.dsig$decr)) ) %>%
fd_gamm_fit[.,]
ECPUE 在 2010 年和 2015 年期間顯着增加,在 2003 年和 2017 年期間減少
繪制導數和聯立置信區間。
draw(fd_gamm_fit,
alpha = 0.2) +
theme(axis.text.x = element_text(angle = -45)) +
scale_x_continuous(breaks = seq(2001, 2020, 1),
labels = seq(2001, 2020, 1)) +
labs(title = "Simultaneous CI first derivative") +
geom_hline(yintercept = 0)
rm(p2,
m2.dsig,
new_data)
Plot GAMM 符合響應量表
預測數據
new_data_pred <- data.frame(matrix(c(seq(2001,2020,0.1)),
nrow = 191, ncol = 1, byrow = FALSE)
)
names(new_data_pred) <- "year"
pred1 <- predict(gamm_fit$gam, new_data_pred, type = "response", se.fit = TRUE)
years <- seq(2001,2020,0.1)
upper_2se <- (pred1$fit-1)+(pred1$se.fit*2)
lwr_2se <- (pred1$fit-1)-(pred1$se.fit*2)
pred_2 <- cbind(years, pred1$fit-1, upper_2se, lwr_2se) %>%
as.data.frame()
names(pred_2) <- c("years", "pred", "upr_2se", "lwr_2se")
rm(new_data_pred,
years,
upper_2se,
lwr_2se,
pred1)
創建圖表
ggplot(data = pred_2, aes(x = years, y = pred)) +
labs(title = "") +
scale_x_continuous(name = "Year",
breaks = seq(2001,2020,1),
labels = seq(2001,2020,1)) +
scale_y_continuous(name = "ECPUE") +
geom_line() +
geom_ribbon(aes(ymin = lwr_2se, ymax = upr_2se),
colour = "grey",
alpha = 0.2,
linetype = "blank") +
geom_line(data = pred_2 %>%
subset(years >= 2010 &
years < 2011),
aes(x = years, y = pred),
colour = "blue") +
geom_line(data = pred_2 %>%
subset(years >= 2015 &
years < 2016),
aes(x = years, y = pred),
colour = "blue") +
geom_line(data = pred_2 %>%
subset(years >= 2003 &
years < 2004),
aes(x = years, y = pred),
colour = "red") +
geom_line(data = pred_2 %>%
subset(years >= 2017 &
years < 2018),
aes(x = years, y = pred),
colour = "red") +
geom_point(data = dat_summary,
aes(x = year, y = ECPUE))
這是一些示例數據。
dat <- structure(list(year = c(2001, 2001, 2001, 2001, 2001, 2001, 2001,
2001, 2001, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002,
2002, 2002, 2002, 2002, 2002, 2002, 2003, 2003, 2003, 2003, 2003,
2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2004, 2004,
2004, 2004, 2004, 2004, 2004, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2006,
2006, 2006, 2006, 2006, 2006, 2006, 2006, 2007, 2007, 2007, 2007,
2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007,
2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2008, 2009, 2009, 2009, 2009,
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009,
2009, 2009, 2009, 2009, 2009, 2009, 2009, 2010, 2010, 2010, 2010,
2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,
2010, 2010, 2010, 2010, 2010, 2010, 2010, 2011, 2011, 2011, 2011,
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011,
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2012, 2012,
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012,
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012,
2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012,
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,
2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,
2013, 2013, 2013, 2013, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,
2014, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020),
ECPUE = c(0, 0, 12, 0, 5.99999999999999, 0, 0, 24, 0, 5.99999999999999,
0, 12, 0, 5.99999999999999, 5.99999999999999, 12, 0, 0, 0,
18, 12, 5.99999999999999, 0, 35.9999999999999, 0, 0, 0, 0,
0, 0, 0, 5.99999999999999, 0, 4, 0, 0, 5.99999999999999,
5.99999999999999, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0,
0, 0, 5.99999999999999, 0, 0, 0, 0, 0, 0, 5.99999999999999,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.99999999999999,
0, 4, 4, 12, 0, 8, 4, 0, 4, 0, 8, 0, 0, 0, 8, 4, 0, 0, 4,
0, 0, 4, 0, 0, 4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 4,
0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.99999999999999, 0,
0, 0, 0, 0, 4, 12, 0, 0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 18, 12, 0, 4, 18, 0, 12, 4, 0, 0, 0, 0, 8, 0, 0, 0, 12,
4, 0, 0, 4, 0, 18, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 4,
4, 48, 32, 36, 12, 16, 12, 0, 48, 0, 8, 36, 0, 0, 32, 4,
48, 0, 4, 20, 8, 4, 0, 36, 0, 0, 4, 0, 20, 4, 0, 0, 0, 0,
4, 0, 0, 0, 0, 0, 0, 5.99999999999999, 0, 0, 5.99999999999999,
5.99999999999999, 0, 0, 0, 0, 5.99999999999999, 5.99999999999999,
0, 0, 5.99999999999999, 0, 5.99999999999999, 0, 0, 16, 0,
0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 0, 0, 12, 0, 12, 4, 0, 0,
48, 0, 0, 4, 0, 5.99999999999999, 0, 0, 0, 4, 12, 0, 12,
5.99999999999999, 0, 4, 16, 0, 0, 12, 4, 0, 5.99999999999999,
18, 16, 4, 0, 12, 16, 0, 4, 0, 0, 4, 0, 4, 12, 0, 0, 0, 0,
4, 0, 4, 0, 0, 0, 0, 0, 12, 0, 4, 0, 4, 0, 5.99999999999999,
4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 5.99999999999999, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5.99999999999999,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 12, 8,
12, 24, 0, 0, 0, 16, 0, 5.45454545454546, 24, 0, 0, 0, 0,
0, 0, 0, 0, 0, 32, 24, 0, 0, 4, 0, 0, 0, 5.99999999999999,
65.9999999999999, 16, 65.9999999999999, 65.9999999999999,
0, 0, 65.9999999999999, 0, 24, 16, 0, 0, 16, 0, 5.99999999999999,
0, 0, 24, 0, 0, 0, 0, 4, 0, 0, 0, 8, 4, 0, 0, 8, 4, 8, 4,
0, 0, 0, 0, 8, 0, 4, 12, 4, 0, 0, 0, 0, 4, 0, 8, 0, 0, 12,
0, 0, 0, 8, 0, 0, 0, 12, 4, 0, 0, 8, 8, 8, 4, 0, 0, 0, 4,
8, 0, 16, 0, 8, 0, 0, 4, 0, 4, 0, 4, 0, 8, 0, 0, 0, 0, 0,
4, 4, 0, 16, 0, 0, 0, 0, 0, 3.95604395604395, 0, 0, 20, 0,
0, 4, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0, 4, 8, 0, 0, 0, 0, 0,
4, 4, 0, 4, 0, 4, 0, 0, 0, 8, 8, 0, 0, 8, 4, 8, 0, 0, 4,
8, 4, 0, 3, 4, 4, 4, 5.14285714285715, 0, 10.9090909090909,
0, 12, 0, 0, 20, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 3, 20, 0,
0, 5.14285714285715, 0)), class = "data.frame", row.names = c(NA,
-644L))
dat_summary <- structure(list(year = 2001:2020, ECPUE = c(6.66666666666665,
10.9333333333333, 0.714285714285713, 0, 0, 0.5, 0.639999999999999,
1.26829268292683, 1.72727272727272, 3.15151515151515, 11.2, 3.71428571428571,
4.54285714285714, 1.26923076923077, 4.96405919661733, 7.78947368421051,
2.90909090909091, 3.0465724751439, 3.41463414634146, 3.55351545006718
)), row.names = c(NA, -20L), class = "data.frame")
這里有一些錯誤:
我會使用gam()
而不是gamm()
將其與tw()
系列相匹配。 您有一些 0,這些在Gamma
系列中是不允許的,因此您要在響應中加 1,但我看不出有任何理由需要在這里使用gamm()
。
更好的是,假設你的effort
變量將是 model 這些實際上是用poisson()
或nb()
系列計算的,並通過偏移項包括努力 - 添加+ offset(log(effort))
到你的 model 公式,其中effort
是將每單位努力捕獲的計數標准化的變量。 沒有那個變量, tw()
是最好的選擇。
您正在使用來自響應量表的預測來計算置信區間( upper_2se
等),這是不正確的:正如您的 plot 所示,該區間包含負值,這是沒有意義的。 相反,在預測時使用type = "link"
,然后像現在一樣形成置信區間,然后使用 link function 的倒數(在所有提到的案例 — Gamma
、 tw
、 nb
poisson
— 鏈接 function 是log()
,因此您需要exp()
作為逆函數。這樣您將獲得尊重數據屬性的置信區間。
我會在比你更精細的網格上計算導數——你是在 integer 年值周圍的一個非常小的區間內估計導數。
這實際上給你帶來了一些問題,如上一個 plot 所示; 每個藍色尖峰都應與紅色下降相關聯,但由於您評估導數的粗略網格點,您根本看不到顯着下降。 最后一個紅色部分也可能是虛假的。
如果您想對導數進行瞬時測量,您應該像 plot 那樣平滑並為更精細的網格執行此操作,例如在min(year)
到max(year)
范圍內使用seq(min(year), max(year), length = 200)
說。
或者,如果你想要全年的導數(?),你必須將eps
增加到更大的值,比如1
,我建議你做一個中心導數,這樣你就不會在開始時外推太多/時間序列結束。
請記住,這些是鏈接(對數)尺度上的導數; 它們不是響應級別的變化。 響應導數有點復雜,但應該會在幾周內出現。
我會嘗試從您的數據中得出一個完整的示例,但我也有一些問題,我會先作為評論發布。 將其視為 WIP。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.