重组/重塑数据框架（r）

Question

My dataset has repeated observations for people that work on projects. 我的数据集对从事项目工作的人员进行了重复观察。 I need a data frame with two columns that list 'combinations' of projects for each person and time point. 我需要一个包含两列的数据框，列出每个人和时间点的项目“组合”。 Let me explain with an example: 让我用一个例子来解释一下：

This is my data: 这是我的数据：

ID    Week    Project    
01    1       101
01    1       102 
01    1       103
01    2       101
01    2       102
02    1       101
02    1       102
02    2       101

Person 1 (ID = 1) worked on three projects in week 1. This means that there are six possible combinations of projects (project_i & project_j) for this person, in this week. 人1（ID = 1）在第1周对三个项目进行了工作。这意味着本周有六个可能的项目组合（project_i和project_j）。

This is what I need 这就是我需要的

ID   Week    Project_i  Project_j
01    1      101        101
01    1      101        102
01    1      101        103
01    1      102        101
01    1      102        102    
01    1      102        103
01    1      103        101
01    1      103        102
01    1      103        103
01    2      101        101
01    2      101        102
01    2      102        101
01    2      102        102
02    1      101        101
02    1      101        102
02    1      102        101
02    1      102        102
02    2      101        101

Losing cases that only have one project per week is not an issue. 丢失每周只有一个项目的案例不是问题。

I have tried basic r and reshape2 for a bit, but I can't figure this out. 我已经尝试了基本的r和reshape2了一下，但我无法弄清楚这一点。

Answer 1

Here is a solution that uses dplyr and tidyr . 这是一个使用dplyr和tidyr的解决方案。 The key step is tidyr::complete() combined with dplyr::group_by() 关键步骤是tidyr::complete()与dplyr::group_by()相结合

library(dplyr)
library(tidyr)

d %>% 
  rename(Project_i = Project) %>%
  mutate(Project_j = Project_i) %>% 
  group_by(ID, Week) %>%
  complete(Project_i, Project_j) %>%
  filter(Project_i != Project_j)

Answer 2

Here's one way: 这是一种方式：

library(data.table)
setDT(DT)

DT[, CJ(P1 = Project, P2 = Project)[P1 != P2], by=.(ID, Week)]

    ID Week  P1  P2
 1:  1    1 101 102
 2:  1    1 101 103
 3:  1    1 102 101
 4:  1    1 102 103
 5:  1    1 103 101
 6:  1    1 103 102
 7:  1    2 101 102
 8:  1    2 102 101
 9:  2    1 101 102
10:  2    1 102 101

CJ is the Cartesian Join of two vectors, taking all combinations. CJ是两个向量的笛卡尔连接，采用所有组合。

If you don't want both (101,102) and (102,101), use P1 > P2 instead of P1 != P2 . 如果您不同时需要（101,102）和（102,101），请使用P1 > P2而不是P1 != P2 。 Oh, the OP has changed the question... so use P1 <= P2 . 哦，OP改变了问题......所以使用P1 <= P2 。

Answer 3

Here's a base option using expand.grid : 这是使用expand.grid的基本选项：

do.call(rbind, lapply(split(df, paste(df$ID, df$Week)), function(x){
    x2 <- expand.grid(ID = unique(x$ID), 
                      Week = unique(x$Week), 
                      Project_i = unique(x$Project), 
                      Project_j = unique(x$Project))
    # omit if 101 102 is different from 102 101; make `<` if 101 101 not possible
    x2[x2$Project_i <= x2$Project_j,]
}))

#       ID Week Project_i Project_j
# 1 1.1  1    1       101       101
# 1 1.4  1    1       101       102
# 1 1.5  1    1       102       102
# 1 1.7  1    1       101       103
# 1 1.8  1    1       102       103
# 1 1.9  1    1       103       103
# 1 2.1  1    2       101       101
# 1 2.3  1    2       101       102
# 1 2.4  1    2       102       102
# 2 1.1  2    1       101       101
# 2 1.3  2    1       101       102
# 2 1.4  2    1       102       102
# 2 2    2    2       101       101

重组/重塑数据框架（r）

问题描述

3 个解决方案

解决方案1
6 2016-04-13 18:50:43

解决方案2
6 已采纳 2016-04-13 18:50:55

解决方案3
5 2016-04-13 19:14:13

重组/重塑数据框架（r）

问题描述

3 个解决方案

解决方案1 6 2016-04-13 18:50:43

解决方案2 6 已采纳 2016-04-13 18:50:55

解决方案3 5 2016-04-13 19:14:13

解决方案1
6 2016-04-13 18:50:43

解决方案2
6 已采纳 2016-04-13 18:50:55

解决方案3
5 2016-04-13 19:14:13