I am trying to summarize data across two variables, and the output with summarize is very chunky (at least in the r notebook output where the table breaks over multiple pages). I'd like to have one variable as the rows of the summary output, and the other as the columns, and then in the actual table the means for each combination of row & column data Some example data:
dat1 <- data.frame(
category = rep(c("catA", "catB", "catC"), each=4),
age = sample(1:2,size=4,replace=T),
value = rnorm(12)
)
and then I would usually get my summary dataframe like this:
dat1 %>% group_by(category,age)%>% summarize(mean(value))
but my actual data each of the variables have 10+ levels, so the table is very long and hard to read. I would prefer something like this, which I created using:
dat1 %>% group_by(category)
%>% summarize(mean.age1 =mean(value[age==1]),
mean.age2 =mean(value[age==2]))
There must be a better way than hand-coding means column?
You just need to use tidyr
in addition to do something like this:
library(dplyr)
library(tidyr)
dat1 %>%
group_by(category, age) %>%
summarise(mean = mean(value)) %>%
spread(age, mean, sep = '')
Output is as follows:
Source: local data frame [3 x 3]
Groups: category [3]
category age1 age2
* <fctr> <dbl> <dbl>
1 catA 0.2930104 0.3861381
2 catB 0.5752186 0.1454201
3 catC 1.0845645 0.3117227
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.