简体   繁体   English

在dplyr包中使用group_by和mutate通过id变量创建新的factor因子

[英]Using group_by and mutate in dplyr package to create new factor variable by id variable

I have a hierarchical data frame in long format, where each row represents relationships, of which many can belong to a single person. 我有一个长格式的分层数据框,其中每一行代表关系,其中许多可以属于一个人。 Here is code for a small example dataset: 以下是小示例数据集的代码:

df <- data.frame(id = as.factor(c(1,1,1,2,2,3,4,4)),
             partner = c(1,2,3,1,2,1,1,2),
             kiss = as.factor(c("Yes", "No", "No", "No", "No", "Yes", "Yes", "No")))

  id partner kiss
1  1       1  Yes
2  1       2   No
3  1       3   No
4  2       1   No
5  2       2   No
6  3       1  Yes
7  4       1  Yes
8  4       2   No

I want to create a new factor variable in this dataset that indicates whether the person (indicated by the 'id variable) never kissed any of their 'partners'. 我想在这个数据集中创建一个新的因子变量,指示该人(由'id变量表示)是否从未亲吻过他们的任何“伙伴”。 In other words, if the person had a kiss with any of their partners the new variable would indicate 'Yes' — they never had a kiss with any partner. 换句话说,如果这个人与他们的任何一个伙伴有一个吻,那么新的变量将表示“是” - 他们从未与任何伴侣发过吻。 Here is what I think it should look like: 这是我认为应该是这样的:

  id partner kiss neverkiss
1  1       1  Yes        No
2  1       2   No        No
3  1       3   No        No
4  2       1   No       Yes
5  2       2   No       Yes
6  3       1  Yes        No
7  4       1  Yes        No
8  4       2   No        No

Ideally, I would like to find a way to create this variable without reshaping the dataset. 理想情况下,我想找到一种方法来创建这个变量而不重塑数据集。 I also prefer to use the dplyr package. 我也更喜欢使用dplyr包。 So far, I've thought about using the group_by, and mutate functions in this package to create this variable. 到目前为止,我已经考虑过使用group_by和mutate函数来创建这个变量。 However, i'm not sure what helper functions I can use to create my specific variable. 但是,我不确定我可以使用哪些辅助函数来创建我的特定变量。 I'm open to other ideas outside of the dplyr package, but that would be first prize for me. 我对dplyr包之外的其他想法持开放态度,但这对我来说是一等奖。

This should do it 这应该做到这一点

require(dplyr)

df <- data.frame(id = as.factor(c(1,1,1,2,2,3,4,4)),
             partner = c(1,2,3,1,2,1,1,2),
             kiss = as.factor(c("Yes", "No", "No", "No", "No", "Yes", "Yes", "No")))

df_new <- df %>% 
   group_by(id) %>% 
   mutate("neverkiss" = {if (any(kiss == "Yes")) "No" else "Yes"})

df_new

If the new column should contain factors you have to ungroup first 如果新列应包含您必须先ungroup因子

df_new <- df %>% 
   group_by(id) %>% 
   mutate("neverkiss" = {if (any(kiss == "Yes")) "No" else "Yes"}) %>% 
   ungroup() %>% 
   mutate("neverkiss" = as.factor(neverkiss))

class(df_new$neverkiss)
[1] "factor"

The reason is that factors cant be combined: 原因是因素无法合并:

a <- as.factor(c("Yes", "Yes", "Yes"))
b <- as.factor(c("No", "No", "No")) 

c(a, b) # meaningless

As grouping is still active mutate is basically building the vector neverkiss as a combination of vectors for each id (group) which results in a vector of just one level (in this case "No"). 由于分组仍处于活动状态,因此mutate基本上将向量neverkiss构建为每个id (组)的向量组合,这导致仅一个级别的向量(在这种情况下为“否”)。

We can also do it with data.table 我们也可以用data.table

library(data.table)
setDT(df)[, neverkiss := if(any(kiss=="Yes")) "No" else "Yes" , id]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM