如何使用 gtsummary::tbl_svysummary() 显示因子变量水平的置信区间？

Question

我正在使用来自国家电子伤害监测系统 ( https://www.cpsc.gov/Research--Statistics/NEISS-Injury-Data ) 的调查数据来研究消费品伤害的趋势。

使用 gtsummary 和 tbl_svysummary()，我的目标是创建一个描述性的伤害总结测量表。 由于这是调查数据，我想显示与每个汇总度量相关的 95% 置信区间。

上一篇文章提供了为两个水平因子变量生成置信区间的解决方案（使用 (gtsummary) tbl_svysummaary function 来显示survey.design object? 的置信区间？），但是，我正在寻找一种解决方案来生成因子变量的置信区间>=2 级。

我从上一篇文章中借用了一个可重现的例子：

library(gtsummary)
library(survey)

svy_trial <-
  svydesign(~1, data = trial %>% select(trt, response, death), weights = ~1) 

ci <- function(variable, by, data, ...) {
  svyby(as.formula( paste0( "~" , variable)) , by = as.formula( paste0( "~" , by)), data, svyciprop, vartype="ci") %>%
    tibble::as_tibble() %>%
    dplyr::mutate_at(vars(ci_l, ci_u), ~style_number(., scale = 100) %>% paste0("%")) %>%
    dplyr::mutate(ci = stringr::str_glue("{ci_l}, {ci_u}")) %>%
    dplyr::select(all_of(c(by, "ci"))) %>%
    tidyr::pivot_wider(names_from = all_of(by), values_from = ci) %>%
    set_names(paste0("add_stat_", seq_len(ncol(.))))
}

ci("response", "trt", svy_trial)
#> # A tibble: 1 x 2
#>   add_stat_1 add_stat_2
#>   <glue>     <glue>    
#> 1 21%, 40%   25%, 44%  

svy_trial %>%
  tbl_svysummary(by = "trt", missing = "no") %>%
  add_stat(everything() ~ "ci") %>%
  modify_table_body(
    dplyr::relocate, add_stat_1, .after = stat_1
  ) %>%
  modify_header(starts_with("add_stat_") ~ "**95% CI**") %>%
  modify_footnote(everything() ~ NA)

上一篇文章的表格截图1

在上面的示例中，因子变量有两个级别，并且显示了来自 1 个级别的汇总数据。

如何调整上述方法，以便两个级别的因子变量都以各自的置信区间显示？
如何将此解决方案推广到具有>2 个水平的因子变量（例如，按如下方式分类的年龄变量：<18 岁、18-25 岁、26-50 岁等）？
最后，这个所需的解决方案如何也适应为同一列中的连续变量生成置信区间作为因子变量的置信区间？

这是我要生成的表的示例：所需表的屏幕截图 output 2

如果此帮助请求没有遵循良好的堆栈溢出礼仪（我对这个社区相当陌生），我们深表歉意，非常感谢您的时间和帮助！

Answer 1

我为具有 >=2 级别的因子准备了一个示例，但没有使用by=变量（尽管方法相似）。 仅供参考，我们有一个未解决的问题，可以使用新的 function add_ci.tbl_svysummary()更彻底地支持调查对象，它将计算分类变量和连续变量的 CI。 您可以单击此处的“订阅”链接以在实现此功能时收到警报 https://github.com/ddsjoberg/gtsummary/issues/965

同时，这是一个代码示例：

library(gtsummary)
library(tidyverse)
packageVersion("gtsummary")
#> [1] '1.5.0'

svy <- survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) 

# put the CI in a tibble with the variable name
# first create a data frame with each variable and it's values
df_result <- 
  tibble(variable = c("Class", "Sex", "Age", "Survived")) %>%
  # get the levels of each variable in a new column
  # adding them as a list to allow for different variable classes
  rowwise() %>%
  mutate(
    # level to be used to construct call
    level = unique(svy$variables[[variable]]) %>% as.list() %>% list(),
    # character version to be merged into table
    label = unique(svy$variables[[variable]]) %>% as.character() %>% as.list() %>% list()
  ) %>%
  unnest(c(level, label)) %>%
  mutate(
    label = unlist(label)
  )

# construct call to svyciprop
df_result$svyciprop <-
  map2(
    df_result$variable, df_result$label,
    function(variable, level) rlang::inject(survey::svyciprop(~I(!!rlang::sym(variable) == !!level), svy))
  )


# round/format the 95% CI
df_result <-
  df_result %>%
  rowwise() %>%
  mutate(
    ci = 
      svyciprop %>%
      attr("ci") %>%
      style_sigfig(scale = 100) %>%
      paste0("%", collapse = ", ")
  ) %>% 
  ungroup() %>%
  # keep variables needed in tbl
  select(variable, label, ci)


# construct gtsummary table with CI
tbl <- 
  svy %>%
  tbl_svysummary() %>%
  # merge in CI
  modify_table_body(
    ~.x %>%
      left_join(
        df_result, 
        by = c("variable", "label")
      )
  ) %>%
  # add a header
  modify_header(ci = "**95% CI**")

^{由reprex package (v2.0.1) 于 2021 年 12 月 4 日创建}

如何使用 gtsummary::tbl_svysummary() 显示因子变量水平的置信区间？

问题描述

1 个解决方案

解决方案1
0 2021-12-04 13:55:51

如何使用 gtsummary::tbl_svysummary() 显示因子变量水平的置信区间？

问题描述

1 个解决方案

解决方案1 0 2021-12-04 13:55:51

解决方案1
0 2021-12-04 13:55:51