简体   繁体   中英

Can You Iterate Through Columns AND Unique Variables of Each Column to create a summary in R?

Considering the example dataframe below, is it possible to iterate over each column, and the unique variable in each column to obtain a summary of the unique variables for each column?

sex <- c("M","F","M","M","F","F","F","M","M","F") 
school <- c("north","north","central","south","south","south","central","north","north","south")
days_missed <- c(5,1,2,0,7,1,3,2,4,15)

df <- data.frame(sex, school, days_missed, stringsAsFactors = F)

In this example, I want to be able to create a summary of missed days by sex and school

My expected output would 1 data frame for sex and one for schoool with output similar to below:

sex        missed_days
M          13
F          27

school     missed_days
north      12
central    5
south      23

I tried (without success):

for(i in seq_along(select(df,1:2)) {
output[[i]] <-  sum(df$days_missed[[i]] )
}

Is there a way to accomplish what I am looking to do?

in base R you could do:

lapply(1:2,function(x)xtabs(days_missed~.,df[c(x,3)]))
[[1]]
sex
 F  M 
27 13 

[[2]]
school
central   north   south 
      5      12      23 

using tidyverse:

library(tidyverse)
map(df[-3],~xtabs(days_missed~.x,df))

$sex
.x
 F  M 
27 13 

$school
.x
central   north   south 
      5      12      23 

if you must use summarize then:

df %>% 
   summarise_at(vars(-days_missed), ~list(xtabs(days_missed~.x))) %>%
   {t(.)[,1]}

$sex
.x
 F  M 
27 13 

$school
.x
central   north   south 
      5      12      23 

Here is a tidyverse approach

library(tidyverse)

sex <- c("M","F","M","M","F","F","F","M","M","F") 
school <- c("north","north","central","south","south","south","central","north","north","south")
days_missed <- c(5,1,2,0,7,1,3,2,4,15)

df <- data.frame(sex, school, days_missed, stringsAsFactors = F)

df %>% 
  group_by(sex) %>% 
  summarise(missed_day = sum(days_missed))

df %>% 
  group_by(school) %>% 
  summarise(missed_day = sum(days_missed))

If you want to map all other features

simple_operation <- function(x,group) {
  x %>% 
    group_by_at({{group}}) %>% 
    summarise(missed_day = sum(days_missed))
}

variable_names <- 
  df %>% 
  select(-days_missed) %>% 
  names()

map(.x = variable_names,.f = ~ simple_operation(x = df,group = .))

In base R, you can use lapply along with tapply to get sum of days_missed by group.

lapply(df[-ncol(df)], function(x) tapply(df$days_missed, x, sum))

Or using tidyverse :

library(dplyr)

cols <- c('sex', 'school')
purrr::map(cols, ~df %>% group_by_at(.x) %>% summarise(sum = sum(days_missed)))


#[[1]]
# A tibble: 2 x 2
#  sex     sum
#  <chr> <dbl>
#1 F        27
#2 M        13

#[[2]]
# A tibble: 3 x 2
#  school    sum
#  <chr>   <dbl>
#1 central     5
#2 north      12
#3 south      23

This returns a list of dataframes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM