简体   繁体   中英

How to exclusively add some values from duplicated rows in a dataframe in R?

I have a data frame with 3 variables (section, age_group and population) and 3011 observations.

There are 12 different age groups in age_group, each of them identified with a number from 1 to 12. The 1 is for 18 year old people, 2 is for 19, 3 is for 20-24 yo, 4 is for 25-29... and 12 is for 65+ yo.

For each section there are 12 rows with the population sorted by age group.

I want to have each section displayed by generation (z, millennial, x, baby boomers) which would be for z = age_groups 1:3, millennial = age_groups 4:6, x = age_groups 7:9, boomers = 10:12

I have tried with the dplyr package, specifically with ddply, ie

ddply(~section, summarise, age_group = sum(age_group), population = sum(population))

But idk how to separate from each group without all of the group ages get merged in a sum.

fragment of the table I'm using

在此处输入图像描述

It sounds like you need to add another column for generation and then summarise by generation and age_group :

df  = df %>%
  mutate(generation = case_when(age_groups %in% c(1,2,3) ~ "z",
                                age_groups %in% c(4,5,6) ~ "millennial",
                                age_groups %in% c(7,8,9) ~ "x",
                                age_groups %in% c(10,11,12) ~ "boomers")) %>%
  group_by(generation, age_groups) %>%
  summarise(num = n(), .groups = "drop")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM