简体   繁体   中英

R add all combinations of three values of a vector to a three-dimensional array

I have a data frame with two columns. The first one "V1" indicates the objects on which the different items of the second column "V2" are found, eg:

V1 <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C")
V2 <- c("a","b","c","d","a","c","d","a","b","d","e")
df <- data.frame(V1, V2)

"A" for example contains "a", "b", "c", and "d". What I am looking for is a three dimensional array with dimensions of length(unique(V2)) (and the names "a" to "e" as dimnames ).

For each unique value of V1 I want all possible combinations of three V2 items (eg for "A" it would be c("a", "b", "c") , c("a", "b", "d" , and c("b", "c", "d") .

Each of these "three-item-co-occurrences" should be regarded as a coordinate in the three-dimensional array and therefore be added to the frequency count which the values in the array should display. The outcome should be the following array

ar <- array(data     = c(0,0,0,0,0,0,0,1,2,1,0,1,0,2,0,0,2,2,0,1,0,1,0,1,0,
                         0,0,1,2,1,0,0,0,0,0,1,0,0,1,0,2,0,1,0,1,1,0,0,1,0,
                         0,1,0,2,0,1,0,0,1,0,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,
                         0,2,2,0,1,2,0,1,0,1,2,1,0,0,0,0,0,0,0,0,1,1,0,0,0,
                         0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0),
            dim      = c(5, 5, 5),
            dimnames = list(c("a", "b", "c", "d", "e"),
                            c("a", "b", "c", "d", "e"),
                            c("a", "b", "c", "d", "e")))

I was wondering about the 3D symmetry of your result. It took me a while to understand that you want to have all permutations of all combinations.

library(gtools) #for the permutations

foo <- function(x) {
  #all combinations:
  combs <- combn(x, 3, simplify = FALSE) 
  #all permutations for each of the combinations:
  combs <- do.call(rbind, lapply(combs, permutations, n = 3, r = 3)) 
  #tabulate:
  do.call(table, lapply(asplit(combs, 2), factor, levels = letters[1:5]))
}

#apply grouped by V1, then sum the results
res <- Reduce("+", tapply(df$V2, df$V1, foo))

#check
all((res - ar)^2 == 0)
#[1] TRUE

I used to use the crossjoin CJ() to retain the pairwise count of all combinations of two different V2 items

res <- setDT(df)[,CJ(unique(V2), unique(V2)), V1][V1!=V2,
    .N, .(V1,V2)][order(V1,V2)]

This code creates a data frame res with three columns. V1 and V2 contain the respective items of V2 from the original data frame df and N contains the count (how many times V1 and V2 appear with the same value of V1 (from the original data frame df ).

Now, I found that I could perform this crossjoin with three 'dimensions' as well by just adding another unique(V2) and adapting the rest of the code accordingly.

The result is a data frame with four columns. V1, V2, and V3 indicate the original V2 items and N again shows the number of mutual appearances with the same original V1 objects.

res <- setDT(df)[,CJ(unique(V2), unique(V2), unique(V2)), V1][V1!=V2 & V1 != V3 & V2 != V3,
    .N, .(V1,V2,V3)][order(V1,V2,V3)]

The advantage of this code is that all empty combinations (those which do not appear at all) are not considered. It worked with 1,000,000 unique values in V1 and over 600 unique items in V2 , which would have otherwise caused an extremely large array of 600 x 600 x 600

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM