简体   繁体   English

R 将向量的三个值的所有组合添加到三维数组

[英]R add all combinations of three values of a vector to a three-dimensional array

I have a data frame with two columns.我有一个包含两列的数据框。 The first one "V1" indicates the objects on which the different items of the second column "V2" are found, eg:第一个“V1”表示找到第二列“V2”的不同项目的对象,例如:

V1 <- c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C", "C")
V2 <- c("a","b","c","d","a","c","d","a","b","d","e")
df <- data.frame(V1, V2)

"A" for example contains "a", "b", "c", and "d".例如,“A”包含“a”、“b”、“c”和“d”。 What I am looking for is a three dimensional array with dimensions of length(unique(V2)) (and the names "a" to "e" as dimnames ).我正在寻找的是一个三维数组,其维度为length(unique(V2)) (名称“a”到“e”为dimnames )。

For each unique value of V1 I want all possible combinations of three V2 items (eg for "A" it would be c("a", "b", "c") , c("a", "b", "d" , and c("b", "c", "d") .对于V1的每个唯一值,我想要三个V2项目的所有可能组合(例如,对于“A”,它将是c("a", "b", "c") , c("a", "b", "d"c("b", "c", "d")

Each of these "three-item-co-occurrences" should be regarded as a coordinate in the three-dimensional array and therefore be added to the frequency count which the values in the array should display.这些“三项共现”中的每一个都应被视为三维数组中的一个坐标,因此应将其添加到数组中的值应显示的频率计数中。 The outcome should be the following array结果应该是以下数组

ar <- array(data     = c(0,0,0,0,0,0,0,1,2,1,0,1,0,2,0,0,2,2,0,1,0,1,0,1,0,
                         0,0,1,2,1,0,0,0,0,0,1,0,0,1,0,2,0,1,0,1,1,0,0,1,0,
                         0,1,0,2,0,1,0,0,1,0,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,
                         0,2,2,0,1,2,0,1,0,1,2,1,0,0,0,0,0,0,0,0,1,1,0,0,0,
                         0,1,0,1,0,1,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0),
            dim      = c(5, 5, 5),
            dimnames = list(c("a", "b", "c", "d", "e"),
                            c("a", "b", "c", "d", "e"),
                            c("a", "b", "c", "d", "e")))

I was wondering about the 3D symmetry of your result.我想知道您的结果的 3D 对称性。 It took me a while to understand that you want to have all permutations of all combinations.我花了一段时间才明白你想要拥有所有组合的所有排列。

library(gtools) #for the permutations

foo <- function(x) {
  #all combinations:
  combs <- combn(x, 3, simplify = FALSE) 
  #all permutations for each of the combinations:
  combs <- do.call(rbind, lapply(combs, permutations, n = 3, r = 3)) 
  #tabulate:
  do.call(table, lapply(asplit(combs, 2), factor, levels = letters[1:5]))
}

#apply grouped by V1, then sum the results
res <- Reduce("+", tapply(df$V2, df$V1, foo))

#check
all((res - ar)^2 == 0)
#[1] TRUE

I used to use the crossjoin CJ() to retain the pairwise count of all combinations of two different V2 items我曾经使用交叉连接CJ()来保留两个不同 V2 项目的所有组合的成对计数

res <- setDT(df)[,CJ(unique(V2), unique(V2)), V1][V1!=V2,
    .N, .(V1,V2)][order(V1,V2)]

This code creates a data frame res with three columns.此代码创建一个包含三列的数据框res V1 and V2 contain the respective items of V2 from the original data frame df and N contains the count (how many times V1 and V2 appear with the same value of V1 (from the original data frame df ). V1V2包含来自原始数据帧dfV2的相应项, N包含计数( V1V2出现多少次具有相同的V1值(来自原始数据帧df )。

Now, I found that I could perform this crossjoin with three 'dimensions' as well by just adding another unique(V2) and adapting the rest of the code accordingly.现在,我发现我也可以通过添加另一个unique(V2)并相应地调整代码的 rest 来执行具有三个“维度”的交叉连接。

The result is a data frame with four columns.结果是一个有四列的数据框。 V1, V2, and V3 indicate the original V2 items and N again shows the number of mutual appearances with the same original V1 objects. V1, V2, and V3表示原始V2项目, N再次表示与相同原始V1对象相互出现的次数。

res <- setDT(df)[,CJ(unique(V2), unique(V2), unique(V2)), V1][V1!=V2 & V1 != V3 & V2 != V3,
    .N, .(V1,V2,V3)][order(V1,V2,V3)]

The advantage of this code is that all empty combinations (those which do not appear at all) are not considered.此代码的优点是不考虑所有空组合(根本不出现的组合)。 It worked with 1,000,000 unique values in V1 and over 600 unique items in V2 , which would have otherwise caused an extremely large array of 600 x 600 x 600它使用V1中的 1,000,000 个唯一值和V2中的 600 多个唯一项,否则会导致 600 x 600 x 600 的极大数组

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM