I have a data frame with 3 variables (section, age_group and population) and 3011 observations.
There are 12 different age groups in age_group, each of them identified with a number from 1 to 12. The 1 is for 18 year old people, 2 is for 19, 3 is for 20-24 yo, 4 is for 25-29... and 12 is for 65+ yo.
For each section there are 12 rows with the population sorted by age group.
I want to have each section displayed by generation (z, millennial, x, baby boomers) which would be for z = age_groups 1:3, millennial = age_groups 4:6, x = age_groups 7:9, boomers = 10:12
I have tried with the dplyr package, specifically with ddply, ie
ddply(~section, summarise, age_group = sum(age_group), population = sum(population))
But idk how to separate from each group without all of the group ages get merged in a sum.
fragment of the table I'm using
It sounds like you need to add another column for generation
and then summarise by generation
and age_group
:
df = df %>%
mutate(generation = case_when(age_groups %in% c(1,2,3) ~ "z",
age_groups %in% c(4,5,6) ~ "millennial",
age_groups %in% c(7,8,9) ~ "x",
age_groups %in% c(10,11,12) ~ "boomers")) %>%
group_by(generation, age_groups) %>%
summarise(num = n(), .groups = "drop")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.