I have a data frame object where combinations of variables are represented by 1, but which is sparsely populated in that I do not have all combinations mapped out.
eg
A B C Outcome
1 0 0 700
0 1 0 900
0 0 1 450
1 1 0 280
0 1 1 100
... which is missing the potential combinations [101] and [111]
From this, I'd like to expand out all combinations of A, B, and C, taking the outcome value where the combination exists, and where not, populate Outcome with a zero.
eg
A B C Outcome
1 0 0 700
1 1 0 280
1 0 1 0 <- new row
1 1 1 0 <- new row
0 1 0 900
0 1 1 100
0 0 1 450
I'm afraid I don't really have any idea how to do this functionally. I've had a look at expand.grid()
- for example the following also using the plyr
package
expand.grid(rlply(n, c(0,1)))
which for n=3 gives
Var1 Var2 Var3
1 0 0 0
2 1 0 0
3 0 1 0
4 1 1 0
5 0 0 1
6 1 0 1
7 0 1 1
8 1 1 1
which pretty much gives me the grid I'm after, but I'm not clear now how to join my "Outcome" values to this grid, particularly where n is large (say 60 or 70 variables).
Any help gratefully received!
df <- read.table(text =
"A B C Outcome
1 0 0 700
0 1 0 900
0 0 1 450
1 1 0 280
0 1 1 100",
header = TRUE)
res <-
merge(
x = do.call(what = "expand.grid", lapply(head(as.list(df), - 1), unique)),
y = df,
all.x = TRUE
)
res$Outcome[is.na(res$Outcome)] <- 0
res
# A B C Outcome
# 1 0 0 0 0
# 2 0 0 1 450
# 3 0 1 0 900
# 4 0 1 1 100
# 5 1 0 0 700
# 6 1 0 1 0
# 7 1 1 0 280
# 8 1 1 1 0
Edit:
Not sure whether it should go in a separate answer, but here is a more elegant way with the tidyr
package:
library(tidyr)
complete(df, A, B, C, fill = list(Outcome = 0))
If you want to avoid typing all 60 or 70 column names:
complete_(df, cols = setdiff(names(df), "Outcome"), fill = list(Outcome = 0))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.