简体   繁体   中英

Summarize categorical variables by numeric: gtsummary package

I need help on how to write R code using gtsummary package to create a summary table with several categorical variables as rows and the column side (the "by" variable) is a numeric variable in my case, age in years. So in essence I would like to summarize several patient categorical characteristics by their mean/median age.

As an example, in this package, with the data "trial", I would like for instance to have on the row axis of the table the categorical variables (marker, stage, grade) while the by variable is "age", so median age for each category of those variables.

Thank you for help. Nelly

I am not 100% clear on what you're asking. I am guessing you want to summarize data by high age and low age (split at the median in the example below)?

First, you will want to create a categorical age variable.

library(gtsummary)
library(tidyverse)

df_age_example <-
  trial %>%
  mutate(
    # create a categorical age variable split at the median
    age2 = ifelse(
      age >= median(.$age, na.rm = TRUE),
      "Age Above or at Median",
      "Age Below Median"
    )
  ) %>%
  # keep variables to be summarized 
  select(age2, marker, grade)

Then you'll want to pass that data frame to tbl_summary() to summarize the data.

tbl_summary(data= df_age_example, by = age2)

That will yield the table below.

在此处输入图像描述

I hope this helps. Happy Coding!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM