New column returns greatest value in grouped data

Question

When an ID has more than one AGE within the same year and month, I want to place the greater of the observed AGE in a new column "MAX_AGE".

library(tidyverse)

ID <- c(1,1,1,2,2,2,2,3,3)
YEAR <- c(2019,2019,2019,2019,2019,2019,2019,2019,2019)
MONTH <- c(3,3,3,6,6,6,7,2,2)
AGE <- c(18,18,19,10,10,11,11,33,33)

tb <- tibble(ID, YEAR, MONTH, AGE)

tb %>%
    group_by(ID, YEAR, MONTH) %>%
    mutate(max_age = case_when(n_distinct(AGE) != 1 ~ top_n(1,AGE),
                               n_distinct(AGE) == 1 ~ as.numeric(AGE),
                               TRUE ~ NA_character_))

I'm getting the following error. Any help understanding/troubleshooting is greatly appreciated. I'd like a solution using dplyr if possible.

Error in UseMethod("tbl_vars") : 
  no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')"

Answer 1

Can't you just do

tb %>% group_by(ID, YEAR, MONTH) %>% mutate(max_age = max(AGE))
#> # A tibble: 9 x 5
#> # Groups:   ID, YEAR, MONTH [4]
#>      ID  YEAR MONTH   AGE max_age
#>   <dbl> <dbl> <dbl> <dbl>   <dbl>
#> 1     1  2019     3    18      19
#> 2     1  2019     3    18      19
#> 3     1  2019     3    19      19
#> 4     2  2019     6    10      11
#> 5     2  2019     6    10      11
#> 6     2  2019     6    11      11
#> 7     2  2019     7    11      11
#> 8     3  2019     2    33      33
#> 9     3  2019     2    33      33

Answer 2

top_n() is a function that returns a tibble . Furthermore you do not need to cast AGE to numeric as it is already of this type. And finally, because you want a numeric at the end you need to use NA_real_ and not NA_character_

you can modify your code this way:

tb %>%
  group_by(ID, YEAR, MONTH) %>%
  mutate(max_age = case_when(n_distinct(AGE) != 1 ~ max(AGE),
                             n_distinct(AGE) == 1 ~ AGE,
                             TRUE ~ NA_real_))

New column returns greatest value in grouped data

Question

2 answers

solution1
2 ACCPTED 2020-02-04 16:19:26

solution2
1 2020-02-04 16:17:57

New column returns greatest value in grouped data

Question

2 answers

solution1 2 ACCPTED 2020-02-04 16:19:26

solution2 1 2020-02-04 16:17:57

solution1
2 ACCPTED 2020-02-04 16:19:26

solution2
1 2020-02-04 16:17:57