[英]Restructure / reshape data frame ( r )
My dataset has repeated observations for people that work on projects. 我的数据集对从事项目工作的人员进行了重复观察。 I need a data frame with two columns that list 'combinations' of projects for each person and time point. 我需要一个包含两列的数据框,列出每个人和时间点的项目“组合”。 Let me explain with an example: 让我用一个例子来解释一下:
This is my data: 这是我的数据:
ID Week Project
01 1 101
01 1 102
01 1 103
01 2 101
01 2 102
02 1 101
02 1 102
02 2 101
Person 1 (ID = 1) worked on three projects in week 1. This means that there are six possible combinations of projects (project_i & project_j) for this person, in this week. 人1(ID = 1)在第1周对三个项目进行了工作。这意味着本周有六个可能的项目组合(project_i和project_j)。
This is what I need 这就是我需要的
ID Week Project_i Project_j
01 1 101 101
01 1 101 102
01 1 101 103
01 1 102 101
01 1 102 102
01 1 102 103
01 1 103 101
01 1 103 102
01 1 103 103
01 2 101 101
01 2 101 102
01 2 102 101
01 2 102 102
02 1 101 101
02 1 101 102
02 1 102 101
02 1 102 102
02 2 101 101
Losing cases that only have one project per week is not an issue. 丢失每周只有一个项目的案例不是问题。
I have tried basic r and reshape2 for a bit, but I can't figure this out. 我已经尝试了基本的r和reshape2了一下,但我无法弄清楚这一点。
Here is a solution that uses dplyr
and tidyr
. 这是一个使用dplyr
和tidyr
的解决方案。 The key step is tidyr::complete()
combined with dplyr::group_by()
关键步骤是tidyr::complete()
与dplyr::group_by()
相结合
library(dplyr)
library(tidyr)
d %>%
rename(Project_i = Project) %>%
mutate(Project_j = Project_i) %>%
group_by(ID, Week) %>%
complete(Project_i, Project_j) %>%
filter(Project_i != Project_j)
Here's one way: 这是一种方式:
library(data.table)
setDT(DT)
DT[, CJ(P1 = Project, P2 = Project)[P1 != P2], by=.(ID, Week)]
ID Week P1 P2
1: 1 1 101 102
2: 1 1 101 103
3: 1 1 102 101
4: 1 1 102 103
5: 1 1 103 101
6: 1 1 103 102
7: 1 2 101 102
8: 1 2 102 101
9: 2 1 101 102
10: 2 1 102 101
CJ
is the Cartesian Join of two vectors, taking all combinations. CJ
是两个向量的笛卡尔连接,采用所有组合。
If you don't want both (101,102) and (102,101), use P1 > P2
instead of P1 != P2
. 如果您不同时需要(101,102)和(102,101),请使用P1 > P2
而不是P1 != P2
。 Oh, the OP has changed the question... so use P1 <= P2
. 哦,OP改变了问题......所以使用P1 <= P2
。
Here's a base option using expand.grid
: 这是使用expand.grid
的基本选项:
do.call(rbind, lapply(split(df, paste(df$ID, df$Week)), function(x){
x2 <- expand.grid(ID = unique(x$ID),
Week = unique(x$Week),
Project_i = unique(x$Project),
Project_j = unique(x$Project))
# omit if 101 102 is different from 102 101; make `<` if 101 101 not possible
x2[x2$Project_i <= x2$Project_j,]
}))
# ID Week Project_i Project_j
# 1 1.1 1 1 101 101
# 1 1.4 1 1 101 102
# 1 1.5 1 1 102 102
# 1 1.7 1 1 101 103
# 1 1.8 1 1 102 103
# 1 1.9 1 1 103 103
# 1 2.1 1 2 101 101
# 1 2.3 1 2 101 102
# 1 2.4 1 2 102 102
# 2 1.1 2 1 101 101
# 2 1.3 2 1 101 102
# 2 1.4 2 1 102 102
# 2 2 2 2 101 101
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.