Summarize categorical variables by numeric: gtsummary package

Question

I need help on how to write R code using gtsummary package to create a summary table with several categorical variables as rows and the column side (the "by" variable) is a numeric variable in my case, age in years. So in essence I would like to summarize several patient categorical characteristics by their mean/median age.

As an example, in this package, with the data "trial", I would like for instance to have on the row axis of the table the categorical variables (marker, stage, grade) while the by variable is "age", so median age for each category of those variables.

Thank you for help. Nelly

Answer 1

I am not 100% clear on what you're asking. I am guessing you want to summarize data by high age and low age (split at the median in the example below)?

First, you will want to create a categorical age variable.

library(gtsummary)
library(tidyverse)

df_age_example <-
  trial %>%
  mutate(
    # create a categorical age variable split at the median
    age2 = ifelse(
      age >= median(.$age, na.rm = TRUE),
      "Age Above or at Median",
      "Age Below Median"
    )
  ) %>%
  # keep variables to be summarized 
  select(age2, marker, grade)

Then you'll want to pass that data frame to tbl_summary() to summarize the data.

tbl_summary(data= df_age_example, by = age2)

That will yield the table below.

I hope this helps. Happy Coding!

Summarize categorical variables by numeric: gtsummary package

Question

1 answers

solution1
1 ACCPTED 2020-04-20 20:49:02

Summarize categorical variables by numeric: gtsummary package

Question

1 answers

solution1 1 ACCPTED 2020-04-20 20:49:02

solution1
1 ACCPTED 2020-04-20 20:49:02