简体   繁体   中英

R Join dataframe column to a partially matching grid

I have a data frame object where combinations of variables are represented by 1, but which is sparsely populated in that I do not have all combinations mapped out.

eg

A   B   C   Outcome
1   0   0   700
0   1   0   900
0   0   1   450
1   1   0   280
0   1   1   100

... which is missing the potential combinations [101] and [111]

From this, I'd like to expand out all combinations of A, B, and C, taking the outcome value where the combination exists, and where not, populate Outcome with a zero.

eg

A   B   C   Outcome
1   0   0   700
1   1   0   280
1   0   1   0         <- new row
1   1   1   0         <- new row
0   1   0   900
0   1   1   100
0   0   1   450

I'm afraid I don't really have any idea how to do this functionally. I've had a look at expand.grid() - for example the following also using the plyr package

expand.grid(rlply(n, c(0,1)))

which for n=3 gives

  Var1 Var2 Var3
1    0    0    0
2    1    0    0
3    0    1    0
4    1    1    0
5    0    0    1
6    1    0    1
7    0    1    1
8    1    1    1

which pretty much gives me the grid I'm after, but I'm not clear now how to join my "Outcome" values to this grid, particularly where n is large (say 60 or 70 variables).

Any help gratefully received!

df <- read.table(text = 
"A   B   C   Outcome
1   0   0   700
0   1   0   900
0   0   1   450
1   1   0   280
0   1   1   100",
header = TRUE)

res <- 
  merge(
    x = do.call(what = "expand.grid", lapply(head(as.list(df), - 1), unique)),
    y = df,
    all.x = TRUE
  )
res$Outcome[is.na(res$Outcome)] <- 0
res
#   A B C Outcome
# 1 0 0 0       0
# 2 0 0 1     450
# 3 0 1 0     900
# 4 0 1 1     100
# 5 1 0 0     700
# 6 1 0 1       0
# 7 1 1 0     280
# 8 1 1 1       0

Edit:

Not sure whether it should go in a separate answer, but here is a more elegant way with the tidyr package:

library(tidyr)

complete(df, A, B, C, fill = list(Outcome = 0))

If you want to avoid typing all 60 or 70 column names:

complete_(df, cols = setdiff(names(df), "Outcome"), fill = list(Outcome = 0))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM