具有 95% 引導置信區間的折線圖

Question

嗨，我使用MatchIt package 的遺傳匹配方法匹配了治療組和對照組的個體樣本。 結果變量是每個月的平均支出。 現在我正在嘗試生成一個折線圖來展示治療組和對照組的平均每月支出模式。 我想添加陰影區域來顯示這些線的引導置信區間。 我遇到的主要問題包括：

1. 成對引導

匹配樣本的自舉重采樣應基於每個匹配對，而不是每個個體。 謝天謝地，我從這篇文章中得到了解決方案。 代碼將在示例中提供。

2. 繪制引導置信區間

我不確定如何在圖表上 plot 引導置信區間。

這里我提供一個例子：

library(dplyr)
library(tidyr)
library(ggplot2)

ID <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L")
Pair <- c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6)
Treatment <- c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0)
Month_1 <- c(300, 150, 450, 100, 200, 300, 400, 600, 650, 150, 200, 400)
Month_2 <- c(400, 600, 650, 150, 200, 400, 500, 250, 700, 200, 300, 500)
Month_3 <- c(500, 250, 700, 200, 300, 500, 500, 250, 700, 200, 300, 500)
Month_4 <- c(600, 700, 650, 250, 500, 550, 600, 700, 650, 250, 500, 550)
Month_5 <- c(700, 200, 800, 300, 900, 800, 600, 700, 650, 250, 500, 550)

    df <- data.frame(ID, Pair, Treatment, Month_1, Month_2, Month_3, Month_4, Month_5)

> df

   ID Pair Treatment Month_1 Month_2 Month_3 Month_4 Month_5
1   A    1         1     300     400     500     600     700
2   B    1         0     150     600     250     700     200
3   C    2         1     450     650     700     650     800
4   D    2         0     100     150     200     250     300
5   E    3         1     200     200     300     500     900
6   F    3         0     300     400     500     550     800
7   G    4         1     400     500     500     600     600
8   H    4         0     600     250     250     700     700
9   I    5         1     650     700     700     650     650
10  J    5         0     150     200     200     250     250
11  K    6         1     200     300     300     500     500
12  L    6         0     400     500     500     550     550

在這個數據集中，“Month_1-5”表示每個人的每月支出； “治療”表示個體是在治療組（編碼為1）還是對照組（編碼為0）； “Pair”表示每個匹配的對。 例如，個體 A 和 B 在樣本對中，因為它們共享相同的對數“1”。 因此，在 bootstrap 重采樣之后，如果 A 出現在任何樣本中，B 也應該出現。

在這里，我提供了一個代碼來進行引導重采樣並計算治療組和對照組每個月的平均支出：

### 95% paired bootstrap confidence interval
library(boot)
df <- df %>%
  mutate(Pair = as.factor(Pair))
pair_ids <- levels(df$Pair)

# For mean of Month_1 in control group
est_control <- function(pairs, i) {
  
  # Compute number of times each pair is present
  numreps <- table(pairs[i])
  
  # For each pair p, copy corresponding data indices numreps[p] times
  ids <- unlist(lapply(pair_ids[pair_ids %in% names(numreps)],
                       function(p) rep(which(df$Pair == p),
                                       numreps[p])))

  # Subset df with paired bootstrapped ids
  md_boot <- df[ids,]
                
  # Estimation
  #boot_treat <- c(mean(df[df$Treatment == 0, "Month_1"]), mean(df[df$Treatment == 0, "Month_2"]),
  #                mean(df[df$Treatment == 0, "Month_3"]), mean(df[df$Treatment == 0, "Month_4"]),
  #                mean(df[df$Treatment == 0, "Month_5"]))
  boot_control <- mean(md_boot[md_boot$Treatment == 0, "Month_1"])

  # Return the mean
  return(boot_control)
}
set.seed(1234)
boot_est <- boot(pair_ids, est_fun, R = 10000)
boot.ci(boot.out = boot_est, type = "bca")

# For mean of Month_1 in treatment group
est_treat <- function(pairs, i) {
  
  # Compute number of times each pair is present
  numreps <- table(pairs[i])
  
  # For each pair p, copy corresponding data indices numreps[p] times
  ids <- unlist(lapply(pair_ids[pair_ids %in% names(numreps)],
                       function(p) rep(which(df$Pair == p),
                                       numreps[p])))
  
  # Subset df with paired bootstrapped ids
  md_boot <- df[ids,]
  
  # Estimation
  #boot_treat <- c(mean(df[df$Treatment == 0, "Month_1"]), mean(df[df$Treatment == 0, "Month_2"]),
  #                mean(df[df$Treatment == 0, "Month_3"]), mean(df[df$Treatment == 0, "Month_4"]),
  #                mean(df[df$Treatment == 0, "Month_5"]))
  boot_treat <- mean(md_boot[md_boot$Treatment == 1, "Month_1"])
  
  # Return the mean
  return(boot_treat)
}
set.seed(1234)
boot_est <- boot(pair_ids, est_treat, R = 10000)
boot.ci(boot.out = boot_est, type = "bca")

在這種情況下，對照組“Month_1”的自舉置信區間為 [158.3, 441.7]，治療組的自舉置信區間為 [250, 500]。

然后我計算對照組和治療組的平均每月支出，然后生成折線圖：

Monthly_spending <- df %>%
  group_by(Treatment) %>%
  summarise("1" = mean(Month_1, na.rm = T),
            "2" = mean(Month_2, na.rm = T),
            "3" = mean(Month_3, na.rm = T),
            "4" = mean(Month_4, na.rm = T),
            "5" = mean(Month_1, na.rm = T)) %>%
  pivot_longer(c("1":"5"), names_to = "Month", values_to = "Spending")

Monthly_spending %>%
  mutate(Treatment = as.factor(Treatment)) %>%
  ggplot(aes(x = Month, y = Spending, group = Treatment, color = Treatment)) + 
  geom_point() + 
  geom_line(aes(group = Treatment)) +
  ggtitle("Monthly Spending") + 
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_colour_discrete("Groups", labels = c("Control", "Treatment")) +
  theme(legend.position = "bottom")

但是，我不知道如何在這兩行周圍添加陰影區域，以指示我上面提供的代碼的引導置信區間。 或者，我可以一次又一次地運行每個時間點的代碼，然后在數據集中手動將其記錄到 plot 圖表上的間隔。 但是，我想知道是否有一種方法可以更有效地進行。 我將非常感謝您的幫助。

Answer 1

這似乎是實現最終目標的一種非常復雜的方法。 ggplot2 function mean-cl_boot采用輸入向量並返回通過快速引導方法計算的 y、ymin 和 ymax 值。 它也直接在summarize中工作，因此我們可以將您的整個代碼替換為：

df %>%
  pivot_longer(starts_with("Month"), names_to = "Month") %>%
  group_by(Month, Treatment) %>%
  summarize(mean_cl_boot(value)) %>%
  mutate(Month = as.numeric(sub("Month_", "", Month)),
         Treatment = c("Control", "Treatment")[Treatment + 1]) %>%
  ggplot(aes(Month, y, color = Treatment)) +
  geom_ribbon(aes(fill = Treatment, ymin = ymin, ymax = ymax), 
              alpha = 0.1, color = NA) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  labs(y = "Monthly spending") +
  theme_minimal(base_size = 16) +
  theme(legend.position = "bottom")

具有 95% 引導置信區間的折線圖

問題描述

1 個解決方案

解決方案1
0 2022-09-08 19:31:37

具有 95% 引導置信區間的折線圖

問題描述

1 個解決方案

解決方案1 0 2022-09-08 19:31:37

解決方案1
0 2022-09-08 19:31:37