简体   繁体   English

使用 dplyr 基于 R 中的其他两列自定义变异新列

[英]Custom mutate new column based on two other columns in R using dplyr

My aim is to create a new df column for which the values are based on two other columns.我的目标是创建一个新的 df 列,其值基于其他两列。 My data set concerns recruitment into a study.我的数据集涉及一项研究的招募。 I would like a column that defines whether or not a person was in a particular round of the study, if so was it their first involvement, their second, third and so on (up to 8 rounds).我想要一个专栏来定义一个人是否在研究的特定轮次中,如果是,则是他们的第一次参与、第二次、第三次等(最多 8 轮)。 Currently I am attempting this with mutate(case_when)) in dplyr and using lag() .目前我正在mutate(case_when))尝试使用mutate(case_when))并使用lag() However, it works incorrectly if a person missed a round of the study and later came back into it.然而,如果一个人错过了一轮研究,后来又回来了,它就会错误地工作。 The data set looks like this:数据集如下所示:

    person |  round  |  in_round  |
       A        1           1
       A        2           1
       A        3           1
       A        4           1
       A        5           1
       A        6           0
       A        7           0
       A        8           0
       B        1           0
       B        2           0
       B        3           1
       B        4           1
       B        5           1
       B        6           1
       B        7           0
       B        8           1

What I need is a separate column that uses round and in_round for each person to produce the following:我需要的是一个单独的列,它为每个人使用roundin_round来生成以下内容:

    person |  round  |  in_round  |  round_status
       A        1           1         recruited
       A        2           1        follow_up_1
       A        3           1        follow_up_2
       A        4           1        follow_up_3
       A        5           1        follow_up_4
       A        6           0           none
       A        7           0           none
       A        8           0           none
       B        1           0           none
       B        2           0           none
       B        3           1         recruited
       B        4           1        follow_up_1
       B        5           1        follow_up_2
       B        6           1        follow_up_3
       B        7           0            none
       B        8           1        follow_up_4

In summary:总之:

  • where in_round == 0 , round_status == "none"其中in_round == 0 , round_status == "none"
  • the first time in_round == 1 , round_status == "recruited"第一次in_round == 1 , round_status == "recruited"
  • subsequent times in_round == 1 , round_status == "follow_up_X" (dependent on the number of previous waves the individual was present in).随后的时间in_round == 1round_status == "follow_up_X" (取决于个人所在的先前波数)。

Try this:尝试这个:

df %>% 
  group_by(person) %>%
  arrange(round) %>%
  mutate(cum_round = cumsum(in_round),
         round_status = case_when(
    in_round == 0 ~ "none",
    cum_round == 1 ~ "recruited",
    TRUE ~ paste0("follow_up_", cum_round - 1)
  ))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R dplyr使用自定义函数变异多列来创建新列 - R dplyr mutate multiple columns using custom function to create new column R dplyr 错误 mutate 出现问题 - 尝试按两列分组并用百分比改变新列 - R dplyr error Problem with mutate - Trying to group by two columns and mutate new column with a percentage R Dplyr 通过从具有条件替换值的其他列中计算来改变新列 - R Dplyr mutate new column by calculating from other columns with conditionally replaced values R dplyr / tidyr:使用其他观测值的数据“突变”新列 - R dplyr/tidyr: “mutate” new columns with data from other observations 根据 R 中数据框中所有其他列中的字符串值,使用 dplyr 创建一个新列 - Create a new column using dplyr based on string values in all other columns in a data frame in R 如何使用 dplyr 创建以 R 中其他两列出现为条件的新列? - How to create a new column conditioned on the occurrences of two other columns in R using dplyr? dplyr:根据变量字符串选择的多个列来更改新列 - dplyr: mutate new column based on multiple columns selected by variable string 使用dplyr mutate根据列名向量创建新列 - use dplyr mutate to create new columns based on a vector of column names 如何使用 dplyr r 为 grouped_tbl 中的选择列更改具有行均值的新列? - How to mutate a new column with row means for select columns in grouped_tbl using dplyr r? R dplyr 相对于其他列改变列 - R dplyr mutate columns relative to other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM