在dplyr包中使用group_by和mutate通过id变量创建新的factor因子

Question

I have a hierarchical data frame in long format, where each row represents relationships, of which many can belong to a single person. 我有一个长格式的分层数据框，其中每一行代表关系，其中许多可以属于一个人。 Here is code for a small example dataset: 以下是小示例数据集的代码：

df <- data.frame(id = as.factor(c(1,1,1,2,2,3,4,4)),
             partner = c(1,2,3,1,2,1,1,2),
             kiss = as.factor(c("Yes", "No", "No", "No", "No", "Yes", "Yes", "No")))

  id partner kiss
1  1       1  Yes
2  1       2   No
3  1       3   No
4  2       1   No
5  2       2   No
6  3       1  Yes
7  4       1  Yes
8  4       2   No

I want to create a new factor variable in this dataset that indicates whether the person (indicated by the 'id variable) never kissed any of their 'partners'. 我想在这个数据集中创建一个新的因子变量，指示该人（由'id变量表示）是否从未亲吻过他们的任何“伙伴”。 In other words, if the person had a kiss with any of their partners the new variable would indicate 'Yes' — they never had a kiss with any partner. 换句话说，如果这个人与他们的任何一个伙伴有一个吻，那么新的变量将表示“是” - 他们从未与任何伴侣发过吻。 Here is what I think it should look like: 这是我认为应该是这样的：

  id partner kiss neverkiss
1  1       1  Yes        No
2  1       2   No        No
3  1       3   No        No
4  2       1   No       Yes
5  2       2   No       Yes
6  3       1  Yes        No
7  4       1  Yes        No
8  4       2   No        No

Ideally, I would like to find a way to create this variable without reshaping the dataset. 理想情况下，我想找到一种方法来创建这个变量而不重塑数据集。 I also prefer to use the dplyr package. 我也更喜欢使用dplyr包。 So far, I've thought about using the group_by, and mutate functions in this package to create this variable. 到目前为止，我已经考虑过使用group_by和mutate函数来创建这个变量。 However, i'm not sure what helper functions I can use to create my specific variable. 但是，我不确定我可以使用哪些辅助函数来创建我的特定变量。 I'm open to other ideas outside of the dplyr package, but that would be first prize for me. 我对dplyr包之外的其他想法持开放态度，但这对我来说是一等奖。

Answer 1

This should do it 这应该做到这一点

require(dplyr)

df <- data.frame(id = as.factor(c(1,1,1,2,2,3,4,4)),
             partner = c(1,2,3,1,2,1,1,2),
             kiss = as.factor(c("Yes", "No", "No", "No", "No", "Yes", "Yes", "No")))

df_new <- df %>% 
   group_by(id) %>% 
   mutate("neverkiss" = {if (any(kiss == "Yes")) "No" else "Yes"})

df_new

If the new column should contain factors you have to ungroup first 如果新列应包含您必须先ungroup因子

df_new <- df %>% 
   group_by(id) %>% 
   mutate("neverkiss" = {if (any(kiss == "Yes")) "No" else "Yes"}) %>% 
   ungroup() %>% 
   mutate("neverkiss" = as.factor(neverkiss))

class(df_new$neverkiss)
[1] "factor"

The reason is that factors cant be combined: 原因是因素无法合并：

a <- as.factor(c("Yes", "Yes", "Yes"))
b <- as.factor(c("No", "No", "No")) 

c(a, b) # meaningless

As grouping is still active mutate is basically building the vector neverkiss as a combination of vectors for each id (group) which results in a vector of just one level (in this case "No"). 由于分组仍处于活动状态，因此mutate基本上将向量neverkiss构建为每个id （组）的向量组合，这导致仅一个级别的向量（在这种情况下为“否”）。

Answer 2

We can also do it with data.table 我们也可以用data.table

library(data.table)
setDT(df)[, neverkiss := if(any(kiss=="Yes")) "No" else "Yes" , id]

在dplyr包中使用group_by和mutate通过id变量创建新的factor因子

问题描述

2 个解决方案

解决方案1
8 已采纳 2015-12-07 10:20:12

解决方案2
4 2015-12-07 10:24:14

在dplyr包中使用group_by和mutate通过id变量创建新的factor因子

问题描述

2 个解决方案

解决方案1 8 已采纳 2015-12-07 10:20:12

解决方案2 4 2015-12-07 10:24:14

解决方案1
8 已采纳 2015-12-07 10:20:12

解决方案2
4 2015-12-07 10:24:14