简体   繁体   中英

Custom mutate new column based on two other columns in R using dplyr

My aim is to create a new df column for which the values are based on two other columns. My data set concerns recruitment into a study. I would like a column that defines whether or not a person was in a particular round of the study, if so was it their first involvement, their second, third and so on (up to 8 rounds). Currently I am attempting this with mutate(case_when)) in dplyr and using lag() . However, it works incorrectly if a person missed a round of the study and later came back into it. The data set looks like this:

    person |  round  |  in_round  |
       A        1           1
       A        2           1
       A        3           1
       A        4           1
       A        5           1
       A        6           0
       A        7           0
       A        8           0
       B        1           0
       B        2           0
       B        3           1
       B        4           1
       B        5           1
       B        6           1
       B        7           0
       B        8           1

What I need is a separate column that uses round and in_round for each person to produce the following:

    person |  round  |  in_round  |  round_status
       A        1           1         recruited
       A        2           1        follow_up_1
       A        3           1        follow_up_2
       A        4           1        follow_up_3
       A        5           1        follow_up_4
       A        6           0           none
       A        7           0           none
       A        8           0           none
       B        1           0           none
       B        2           0           none
       B        3           1         recruited
       B        4           1        follow_up_1
       B        5           1        follow_up_2
       B        6           1        follow_up_3
       B        7           0            none
       B        8           1        follow_up_4

In summary:

  • where in_round == 0 , round_status == "none"
  • the first time in_round == 1 , round_status == "recruited"
  • subsequent times in_round == 1 , round_status == "follow_up_X" (dependent on the number of previous waves the individual was present in).

Try this:

df %>% 
  group_by(person) %>%
  arrange(round) %>%
  mutate(cum_round = cumsum(in_round),
         round_status = case_when(
    in_round == 0 ~ "none",
    cum_round == 1 ~ "recruited",
    TRUE ~ paste0("follow_up_", cum_round - 1)
  ))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM