[英]Expand multiple factors as columns in R
我有一个看起来像这样的数据框
> test <- data.frame(ID = c(1,2,3,4,5),ATTR1 = c("A","A","B","C","C"),ATTR2 = c("A2","A2","B2","B2","B2"),ATTR3 = c("A3","A3","A3","B3","B3") )
> test
ID ATTR1 ATTR2 ATTR3
1 1 A A2 A3
2 2 A A2 A3
3 3 B B2 A3
4 4 C B2 B3
5 5 C B2 B3
从这个数据框,我试图获取数据框
> desired_frame <- data.frame(ID = c(1,2,3,4,5),A = c(1,1,0,0,0),B = c(0,0,1,0,0),C = c(0,0,0,1,1),A2 = c(1,1,0,0,0),B2 = c(0,0,1,1,1),A3 = c(1,1,1,0,0), B3 = c(0,0,0,1,1))
> desired_frame
ID A B C A2 B2 A3 B3
1 1 1 0 0 1 0 1 0
2 2 1 0 0 1 0 1 0
3 3 0 1 0 0 1 1 0
4 4 0 0 1 0 1 0 1
5 5 0 0 1 0 1 0 1
我尝试使用dcast,但未成功
test$PROXY <- rep(1,nrow(test))
> dcast(test, ID ~ ATTR1 + ATTR2 + ATTR3, fun.aggregate = mean, value.var = "PROXY")
ID A_A2_A3 B_B2_A3 C_B2_B3
1 1 1 NaN NaN
2 2 1 NaN NaN
3 3 NaN 1 NaN
4 4 NaN NaN 1
5 5 NaN NaN 1
任何帮助将不胜感激
这是到达目的地的漫长路线!
library(tidyr)
df = melt(test, id.vars = "ID", measure.vars = c("ATTR1", "ATTR2", "ATTR3"))
df1 = spread(df, value, variable)
cbind(df1[1], (!is.na(df1[-1]))+0)
# ID A A2 A3 B B2 B3 C
#1 1 1 1 1 0 0 0 0
#2 2 1 1 1 0 0 0 0
#3 3 0 0 1 1 1 0 0
#4 4 0 0 0 0 1 1 1
#5 5 0 0 0 0 1 1 1
这是带有model.matrix
, lapply
和do.call
的基本R解决方案
df <- do.call(cbind, c(test[1], lapply(names(test)[-1],
function(i) model.matrix(reformulate(c(i, -1)), data=test))))
ID ATTR1A ATTR1B ATTR1C ATTR2A2 ATTR2B2 ATTR3A3 ATTR3B3
1 1 1 0 0 1 0 1 0
2 2 1 0 0 1 0 1 0
3 3 0 1 0 0 1 1 0
4 4 0 0 1 0 1 0 1
5 5 0 0 1 0 1 0 1
用-1 reformulate
将返回一个包含一个变量的公式,并删除截距(允许出现所有因子水平)。 model.matrix
此公式并构建因子水平的矩阵。 lapply
套用至每个因子变量,并返回矩阵列表。 最后, do.call
将列表中的矩阵以及ID变量组合在一起。 请注意,这将返回一个矩阵。
要获取data.frame,请将cbind
替换为data.frame
df <- do.call(data.frame, c(test[1], lapply(names(test)[-1],
function(i) model.matrix(reformulate(c(i, -1)), data=test))))
要重命名列,可以使用sub
:
colnames(df) <- sub("ATTR\\d+", "", colnames(df))
另一个基础R解决方案
facs <- apply(test[,-1], 2, unique)
desired_frame <- test
for(j in 1:3){
dummy <- sapply(facs[[j]], "==", test[,j+1])
desired_frame <- cbind(dummy+0, desired_frame)
}
desired_frame
## A3 B3 A2 B2 A B C ID ATTR1 ATTR2 ATTR3
## 1 1 0 1 0 1 0 0 1 A A2 A3
## 2 1 0 1 0 1 0 0 2 A A2 A3
## 3 1 0 0 1 0 1 0 3 B B2 A3
## 4 0 1 0 1 0 0 1 4 C B2 B3
## 5 0 1 0 1 0 0 1 5 C B2 B3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.