计算 dplyr 管道中的累积概率（Kaplan-Meier 生存函数）

Question

我正在尝试使用dplyr管道创建 Kaplan-Meier 生命表。 我无法在不使用 for 循环的情况下计算累积生存概率。 这是一些示例数据。

df <- tibble(
  months = c(1, 3, 9, 13, 17, 20),
  n_at_risk = c(10, 8, 7, 5, 3, 2),
  cond_prob_event = c(0.100, 0.125, 0.143, 0.200, 0.333, 0.500),
  cond_prob_surv = c(0.900, 0.875, 0.857, 0.800, 0.667, 0.50)
)

df

# A tibble: 6 × 4
  months n_at_risk cond_prob_event cond_prob_surv
   <dbl>     <dbl>           <dbl>          <dbl>
1      1        10           0.1            0.9  
2      3         8           0.125          0.875
3      9         7           0.143          0.857
4     13         5           0.2            0.8  
5     17         3           0.333          0.667
6     20         2           0.5            0.5

在这种情况下，累积生存概率计算为先前（滞后）累积生存概率与当前条件生存概率的乘积。 我可以使用 for 循环得到我正在寻找的答案：

out <- vector(mode = "numeric", 6)

for (i in seq_along(df$cond_prob_surv)) {
  if (i == 1) {
    out[i] <- df$cond_prob_surv[i]
  } else {
    out[i] <- out[i - 1] * df$cond_prob_surv[i]
  }
}

df$cum_prob_survival <- out
df

# A tibble: 6 × 5
  months n_at_risk cond_prob_event cond_prob_surv cum_prob_survival
   <dbl>     <dbl>           <dbl>          <dbl>             <dbl>
1      1        10           0.1            0.9               0.9  
2      3         8           0.125          0.875             0.788
3      9         7           0.143          0.857             0.675
4     13         5           0.2            0.8               0.540
5     17         3           0.333          0.667             0.360
6     20         2           0.5            0.5               0.180

但是，出于dplyr原因，我真的很想找到仅dplyr解决方案。 任何帮助是极大的赞赏！

Answer 1

我们可能需要在这里使用cumprod

library(dplyr)
df <- df %>% 
    mutate(cum_prob_survival = cumprod(cond_prob_surv))

-输出

df
# A tibble: 6 × 5
  months n_at_risk cond_prob_event cond_prob_surv cum_prob_survival
   <dbl>     <dbl>           <dbl>          <dbl>             <dbl>
1      1        10           0.1            0.9               0.9  
2      3         8           0.125          0.875             0.788
3      9         7           0.143          0.857             0.675
4     13         5           0.2            0.8               0.540
5     17         3           0.333          0.667             0.360
6     20         2           0.5            0.5               0.180

或者另一种选择是accumulate

library(purrr)
df <- df %>% 
     mutate(cum_prob_survival = accumulate(cond_prob_surv, `*`))

-输出

df
# A tibble: 6 × 5
  months n_at_risk cond_prob_event cond_prob_surv cum_prob_survival
   <dbl>     <dbl>           <dbl>          <dbl>             <dbl>
1      1        10           0.1            0.9               0.9  
2      3         8           0.125          0.875             0.788
3      9         7           0.143          0.857             0.675
4     13         5           0.2            0.8               0.540
5     17         3           0.333          0.667             0.360
6     20         2           0.5            0.5               0.180

Answer 2

使用Reduce基本 R 选项

transform(
  df,
  cum_prob_survival = Reduce(`*`, cond_prob_surv, accumulate = TRUE)
)

给

  months n_at_risk cond_prob_event cond_prob_surv cum_prob_survival
1      1        10           0.100          0.900         0.9000000
2      3         8           0.125          0.875         0.7875000
3      9         7           0.143          0.857         0.6748875
4     13         5           0.200          0.800         0.5399100
5     17         3           0.333          0.667         0.3601200
6     20         2           0.500          0.500         0.1800600

计算 dplyr 管道中的累积概率（Kaplan-Meier 生存函数）

问题描述

2 个解决方案

解决方案1
2 2021-11-16 21:13:31

解决方案2
1 2021-11-16 21:17:14

计算 dplyr 管道中的累积概率（Kaplan-Meier 生存函数）

问题描述

2 个解决方案

解决方案1 2 2021-11-16 21:13:31

解决方案2 1 2021-11-16 21:17:14

解决方案1
2 2021-11-16 21:13:31

解决方案2
1 2021-11-16 21:17:14