When an ID has more than one AGE within the same year and month, I want to place the greater of the observed AGE in a new column "MAX_AGE".
library(tidyverse)
ID <- c(1,1,1,2,2,2,2,3,3)
YEAR <- c(2019,2019,2019,2019,2019,2019,2019,2019,2019)
MONTH <- c(3,3,3,6,6,6,7,2,2)
AGE <- c(18,18,19,10,10,11,11,33,33)
tb <- tibble(ID, YEAR, MONTH, AGE)
tb %>%
group_by(ID, YEAR, MONTH) %>%
mutate(max_age = case_when(n_distinct(AGE) != 1 ~ top_n(1,AGE),
n_distinct(AGE) == 1 ~ as.numeric(AGE),
TRUE ~ NA_character_))
I'm getting the following error. Any help understanding/troubleshooting is greatly appreciated. I'd like a solution using dplyr if possible.
Error in UseMethod("tbl_vars") :
no applicable method for 'tbl_vars' applied to an object of class "c('double', 'numeric')"
Can't you just do
tb %>% group_by(ID, YEAR, MONTH) %>% mutate(max_age = max(AGE))
#> # A tibble: 9 x 5
#> # Groups: ID, YEAR, MONTH [4]
#> ID YEAR MONTH AGE max_age
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2019 3 18 19
#> 2 1 2019 3 18 19
#> 3 1 2019 3 19 19
#> 4 2 2019 6 10 11
#> 5 2 2019 6 10 11
#> 6 2 2019 6 11 11
#> 7 2 2019 7 11 11
#> 8 3 2019 2 33 33
#> 9 3 2019 2 33 33
top_n()
is a function that returns a tibble
. Furthermore you do not need to cast AGE
to numeric
as it is already of this type. And finally, because you want a numeric at the end you need to use NA_real_
and not NA_character_
you can modify your code this way:
tb %>%
group_by(ID, YEAR, MONTH) %>%
mutate(max_age = case_when(n_distinct(AGE) != 1 ~ max(AGE),
n_distinct(AGE) == 1 ~ AGE,
TRUE ~ NA_real_))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.