简体   繁体   中英

Create a summary table for continuous variable by categorical variable

I am a beginner in R, and have transitioned from Stata/SPSS to R. I used to run tabular command in Stata to generate summary of continuous variable by grouping variable. Is there any way I can do this?

I searched on SO, and I found this thread: How to get Summary statistics by group

While Hadley's map function did help me provide quartiles, mean and median, but I need more. Specifically, the number of elements in a particular quartile, the number of elements in a particular level of a factor.

Here's dummy code:

data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 
           71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59)
 grp <- factor(rep(LETTERS[1:4], c(4,6,6,8)))
 df <- data.frame(group=grp, dt=data)

 df %>% 
  data.table::as.data.table(.) %>% 
  split(.,by=c("group"),drop = TRUE,sorted = TRUE) %>% 
  purrr::map(~summary(.$dt))

And

describe(df$group)

gives two different disjointed sets--one only provides descriptive statistics about categorical variable, while the other only provides basic six functions. I need to see what's going on within a quartile.

I am using Hmisc::describe package above.

How can I do this using R? I'd sincerely appreciate any help.


Sample Output:

My sample output would look something like this , but it would be grouped for each of the four levels of categorical variable. This way, I can analyze what's going on with continuous variable for each level of categorical variable. Right now, the output is spread across three different commands, and it harder for me to understand what's happening.

Here are the commands:

 df %>% data.table::as.data.table(.) %>% split(.,by=c("group"),drop = TRUE,sorted = TRUE) %>% purrr::map(~summary(.$dt))
 df %>% data.table::as.data.table(.) %>% split(.,by=c("group"),drop = TRUE,sorted = TRUE) %>% purrr::map(~describe(.$dt))
 df %>% group_by(group) %>% count(quartile = ntile(dt, 4)) 

[The credit for the third command goes to one of the people who answered this questions.]

data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59)
grp <- c(rep(LETTERS[1:4], c(4,6,6,8)))
df <- data.frame(group=grp, dt=data)

library(dplyr)

df %>% group_by(group) %>% summarise(mdt = mean(dt, na.rm = T))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM