使用 dplyr 基于 R 中的其他两列自定义变异新列

Question

我的目标是创建一个新的 df 列，其值基于其他两列。 我的数据集涉及一项研究的招募。 我想要一个专栏来定义一个人是否在研究的特定轮次中，如果是，则是他们的第一次参与、第二次、第三次等（最多 8 轮）。 目前我正在mutate(case_when))尝试使用mutate(case_when))并使用lag() 。 然而，如果一个人错过了一轮研究，后来又回来了，它就会错误地工作。 数据集如下所示：

    person |  round  |  in_round  |
       A        1           1
       A        2           1
       A        3           1
       A        4           1
       A        5           1
       A        6           0
       A        7           0
       A        8           0
       B        1           0
       B        2           0
       B        3           1
       B        4           1
       B        5           1
       B        6           1
       B        7           0
       B        8           1

我需要的是一个单独的列，它为每个人使用round和in_round来生成以下内容：

    person |  round  |  in_round  |  round_status
       A        1           1         recruited
       A        2           1        follow_up_1
       A        3           1        follow_up_2
       A        4           1        follow_up_3
       A        5           1        follow_up_4
       A        6           0           none
       A        7           0           none
       A        8           0           none
       B        1           0           none
       B        2           0           none
       B        3           1         recruited
       B        4           1        follow_up_1
       B        5           1        follow_up_2
       B        6           1        follow_up_3
       B        7           0            none
       B        8           1        follow_up_4

总之：

其中in_round == 0 , round_status == "none"
第一次in_round == 1 , round_status == "recruited"
随后的时间in_round == 1 ， round_status == "follow_up_X" （取决于个人所在的先前波数）。

Answer 1

尝试这个：

df %>% 
  group_by(person) %>%
  arrange(round) %>%
  mutate(cum_round = cumsum(in_round),
         round_status = case_when(
    in_round == 0 ~ "none",
    cum_round == 1 ~ "recruited",
    TRUE ~ paste0("follow_up_", cum_round - 1)
  ))

使用 dplyr 基于 R 中的其他两列自定义变异新列

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-03-03 17:19:27

使用 dplyr 基于 R 中的其他两列自定义变异新列

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-03-03 17:19:27

解决方案1
2 已采纳 2020-03-03 17:19:27