[英]Summarize categorical variables by numeric: gtsummary package
I need help on how to write R code using gtsummary package to create a summary table with several categorical variables as rows and the column side (the "by" variable) is a numeric variable in my case, age in years.我需要有关如何使用 gtsummary package 编写 R 代码的帮助,以创建一个汇总表,其中有几个分类变量作为行,列侧(“by”变量)在我的例子中是一个数字变量,年龄以年为单位。 So in essence I would like to summarize several patient categorical characteristics by their mean/median age.所以本质上我想通过他们的平均/中位年龄来总结几个患者的分类特征。
As an example, in this package, with the data "trial", I would like for instance to have on the row axis of the table the categorical variables (marker, stage, grade) while the by variable is "age", so median age for each category of those variables.例如,在这个 package 中,数据“试验”,例如,我想在表格的行轴上有分类变量(标记、阶段、等级),而按变量是“年龄”,所以中位数这些变量的每个类别的年龄。
Thank you for help.谢谢你的帮助。 Nelly耐莉
I am not 100% clear on what you're asking.我不是 100% 清楚你在问什么。 I am guessing you want to summarize data by high age and low age (split at the median in the example below)?我猜你想按高龄和低龄来总结数据(在下面的例子中分成中位数)?
First, you will want to create a categorical age variable.首先,您需要创建一个分类年龄变量。
library(gtsummary)
library(tidyverse)
df_age_example <-
trial %>%
mutate(
# create a categorical age variable split at the median
age2 = ifelse(
age >= median(.$age, na.rm = TRUE),
"Age Above or at Median",
"Age Below Median"
)
) %>%
# keep variables to be summarized
select(age2, marker, grade)
Then you'll want to pass that data frame to tbl_summary()
to summarize the data.然后,您需要将该数据框传递给tbl_summary()
以汇总数据。
tbl_summary(data= df_age_example, by = age2)
That will yield the table below.这将产生下表。
I hope this helps.我希望这有帮助。 Happy Coding!快乐编码!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.