How to exclusively add some values from duplicated rows in a dataframe in R?

Question

I have a data frame with 3 variables (section, age_group and population) and 3011 observations.

There are 12 different age groups in age_group, each of them identified with a number from 1 to 12. The 1 is for 18 year old people, 2 is for 19, 3 is for 20-24 yo, 4 is for 25-29... and 12 is for 65+ yo.

For each section there are 12 rows with the population sorted by age group.

I want to have each section displayed by generation (z, millennial, x, baby boomers) which would be for z = age_groups 1:3, millennial = age_groups 4:6, x = age_groups 7:9, boomers = 10:12

I have tried with the dplyr package, specifically with ddply, ie

ddply(~section, summarise, age_group = sum(age_group), population = sum(population))

But idk how to separate from each group without all of the group ages get merged in a sum.

fragment of the table I'm using

Answer 1

It sounds like you need to add another column for generation and then summarise by generation and age_group :

df  = df %>%
  mutate(generation = case_when(age_groups %in% c(1,2,3) ~ "z",
                                age_groups %in% c(4,5,6) ~ "millennial",
                                age_groups %in% c(7,8,9) ~ "x",
                                age_groups %in% c(10,11,12) ~ "boomers")) %>%
  group_by(generation, age_groups) %>%
  summarise(num = n(), .groups = "drop")

How to exclusively add some values from duplicated rows in a dataframe in R?

Question

1 answers

solution1
0 2020-09-02 02:31:58

How to exclusively add some values from duplicated rows in a dataframe in R?

Question

1 answers

solution1 0 2020-09-02 02:31:58

solution1
0 2020-09-02 02:31:58