Considering the example dataframe below, is it possible to iterate over each column, and the unique variable in each column to obtain a summary of the unique variables for each column?
sex <- c("M","F","M","M","F","F","F","M","M","F")
school <- c("north","north","central","south","south","south","central","north","north","south")
days_missed <- c(5,1,2,0,7,1,3,2,4,15)
df <- data.frame(sex, school, days_missed, stringsAsFactors = F)
In this example, I want to be able to create a summary of missed days
by sex
and school
My expected output would 1 data frame for sex
and one for schoool
with output similar to below:
sex missed_days
M 13
F 27
school missed_days
north 12
central 5
south 23
I tried (without success):
for(i in seq_along(select(df,1:2)) {
output[[i]] <- sum(df$days_missed[[i]] )
}
Is there a way to accomplish what I am looking to do?
in base R you could do:
lapply(1:2,function(x)xtabs(days_missed~.,df[c(x,3)]))
[[1]]
sex
F M
27 13
[[2]]
school
central north south
5 12 23
using tidyverse:
library(tidyverse)
map(df[-3],~xtabs(days_missed~.x,df))
$sex
.x
F M
27 13
$school
.x
central north south
5 12 23
if you must use summarize
then:
df %>%
summarise_at(vars(-days_missed), ~list(xtabs(days_missed~.x))) %>%
{t(.)[,1]}
$sex
.x
F M
27 13
$school
.x
central north south
5 12 23
Here is a tidyverse approach
library(tidyverse)
sex <- c("M","F","M","M","F","F","F","M","M","F")
school <- c("north","north","central","south","south","south","central","north","north","south")
days_missed <- c(5,1,2,0,7,1,3,2,4,15)
df <- data.frame(sex, school, days_missed, stringsAsFactors = F)
df %>%
group_by(sex) %>%
summarise(missed_day = sum(days_missed))
df %>%
group_by(school) %>%
summarise(missed_day = sum(days_missed))
If you want to map all other features
simple_operation <- function(x,group) {
x %>%
group_by_at({{group}}) %>%
summarise(missed_day = sum(days_missed))
}
variable_names <-
df %>%
select(-days_missed) %>%
names()
map(.x = variable_names,.f = ~ simple_operation(x = df,group = .))
In base R, you can use lapply
along with tapply
to get sum
of days_missed
by group.
lapply(df[-ncol(df)], function(x) tapply(df$days_missed, x, sum))
Or using tidyverse
:
library(dplyr)
cols <- c('sex', 'school')
purrr::map(cols, ~df %>% group_by_at(.x) %>% summarise(sum = sum(days_missed)))
#[[1]]
# A tibble: 2 x 2
# sex sum
# <chr> <dbl>
#1 F 27
#2 M 13
#[[2]]
# A tibble: 3 x 2
# school sum
# <chr> <dbl>
#1 central 5
#2 north 12
#3 south 23
This returns a list of dataframes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.