[英]Determining All Possible Combinations of Items With a Grouping Variable, Allowing for Different Numbers of Items From Each Original Groups
這個問題與這個問題和這個問題非常相似,但它以我自己無法弄清楚的方式結合了兩者的元素。
我有以下清單。
original_groups <- list(group_1 = as.character(1:6), group_2 = as.character(7:12), group_3 = as.character(13:20))
我想根據這些原始組創建新組。 有一個約束 - 每個新組必須包含來自每個原始組的相等數量的項目。 此外,物品不能多次使用。 例如,如果我們從每個原始組中取出一項,我們可能會得到以下新組。
Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_New_Group <- 1
# option 1
new_groups <- list(group_1 = as.character(c(1, 7, 13)), group_2 = as.character(c(2, 8, 14)), group_3 = as.character(c(3, 9, 15)))
# option 2
new_groups <- list(group_1 = as.character(c(1, 7, 13)), group_2 = as.character(c(2, 8, 14)), group_3 = as.character(c(3, 9, 16)))
# option 3
new_groups <- list(group_1 = as.character(c(1, 8, 13)), group_2 = as.character(c(2, 7, 14)), group_3 = as.character(c(3, 9, 15)))
有兩件事讓我希望做的事情變得非常棘手。 首先,我想生成所有可能的組合,因為此操作是更大功能的一部分。 其次,我希望可以選擇讓每個原始組中的多個項目最終出現在每個新組中。 這是另一個例子。
Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_New_Group <- 2
# option 1
new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))
# option 2
new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 19)))
# option 3
new_groups <- list(group_1 = as.character(c(1, 3, 7, 8, 13, 14)), group_2 = as.character(c(2, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))
如果每個原始組至少包含 9 個項目,我什至可以創建新組,每個組包含每個原始組的 3 個項目。
請注意,原始組不需要包含相同數量的項目才能使此過程正常工作 - 第三個原始組包含的項目比其他兩個原始組多。
另外,請注意項目順序在新組中並不重要。 換句話說, new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))
與new_groups <- list(group_1 = as.character(c(2, 1, 8, 7, 14, 13)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))
,所以在我的最終輸出中,我只想報告這些選項之一。
最后,請注意原始組的數量並不總是等於新組的數量——它們只是在這個例子中發生。 我也希望能夠指定要創建多少個新組。 這是一個例子。
Number_of_New_Groups <- 2
# option 1
new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)))
# option 2
new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(5, 6, 11, 12, 17, 18)))
Number_of_New_Groups <- 3
# option 1
new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))
# option 2
new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))
歡迎所有解決方案,但我特別想看到一個只使用base
功能的解決方案。
謝謝!
正如評論中提到的,這里可能涉及大量的組合。 但是,一種可行的方法(假設您有足夠的時間/內存)如下。 此示例僅適用於original_groups
列表的前兩個元素,允許來自每個組的兩個元素。 將最終map
推廣到任意數量的組會很簡單,但這只是為了說明原理。
第一個map
設置組索引向量,用NA
填充到組 (6) 的長度(即在本例中為c(1,2,1,2,NA,NA)
),並找到所有唯一的它的排列。 cross
然后將第一組的每個選項與第二組的每個選項組合在一起,最終的map
使用這些索引將元素分成兩組。
library(combinat) #for permn function
library(tidyverse) #purrr and dplyr
original_groups <- list(group_1 = as.character(1:6), group_2 = as.character(7:12))
no_items <- 2
no_groups <- length(original_groups) #i.e. 2 in this case
output <- map(original_groups,
~unique(permn(`length<-`(rep(seq_len(no_groups),
no_items),
length(.))))) %>%
cross() %>%
map(~list(c(original_groups$group_1[which(.$group_1 == 1)],
original_groups$group_2[which(.$group_2 == 1)]),
c(original_groups$group_1[which(.$group_1 == 2)],
original_groups$group_2[which(.$group_2 == 2)])))
head(output) #full output has 8100 elements
[[1]]
[[1]][[1]]
[1] "1" "3" "7" "9"
[[1]][[2]]
[1] "2" "4" "8" "10"
[[2]]
[[2]][[1]]
[1] "1" "3" "7" "9"
[[2]][[2]]
[1] "2" "5" "8" "10"
[[3]]
[[3]][[1]]
[1] "1" "4" "7" "9"
[[3]][[2]]
[1] "2" "5" "8" "10"
[[4]]
[[4]][[1]]
[1] "1" "4" "7" "9"
[[4]][[2]]
[1] "3" "5" "8" "10"
[[5]]
[[5]][[1]]
[1] "2" "4" "7" "9"
[[5]][[2]]
[1] "3" "5" "8" "10"
[[6]]
[[6]][[1]]
[1] "2" "4" "7" "9"
[[6]][[2]]
[1] "3" "6" "8" "10"
下面的原型函數適用於任意數量的組,其中一些數量(函數中的“numobs”)觀察已被唯一地繪制。 我添加了一個包含四個組而不是 3 個組的示例來說明。
該函數是在 Windows 操作系統上用 R 版本 4.2.1 和 data.table 版本 1.14.2 編寫的。 代碼相當復雜; 更改單個逗號的位置或單個兼容性問題可能會導致函數無法運行。
在海報給出的示例中,從前兩組(長度 6)中抽取兩個不同的成員有 15 種方法,從第三組(長度為 8)中抽取兩個不同的成員有 28 種方法。 那么應該有 15 * 15 * 28 = 6300 種可能的組合。
我添加了另外兩個示例來表明此函數適用於任何給定數量的組,其中 k 個對象取自每個組。 但是,隨着組和對象數量的增加,代碼速度會受到影響,並且可能會成為大量組或對象的問題。
建議引用:Harkness, Jeffrey (2022)。 定制的R組合功能。 於 2022 年 7 月 14 日在 stackoverflow.com 上發布
##original example from poster
(original_groups <- list(group_1 = as.character(1:6),
group_2 = as.character(7:12), group_3 = as.character(13:20)))
$group_1
[1] "1" "2" "3" "4" "5" "6"
$group_2
[1] "7" "8" "9" "10" "11" "12"
$group_3
[1] "13" "14" "15" "16" "17" "18" "19" "20"
#testing 4 groups instead of 3
(original_groups2 <- list(group_1 = as.character(1:6),
group_2 = as.character(7:12), group_3 = as.character(13:20), group_4 = as.character(21:24)))
$group_1
[1] "1" "2" "3" "4" "5" "6"
$group_2
[1] "7" "8" "9" "10" "11" "12"
$group_3
[1] "13" "14" "15" "16" "17" "18" "19" "20"
$group_4
[1] "21" "22" "23" "24"
#testing 4 different groups
(original_groups3 <- list(group_1 = as.character(1:4),
group_2 = as.character(5:9), group_3 = as.character(10:16), group_4 = as.character(17:22)))
$group_1
[1] "1" "2" "3" "4"
$group_2
[1] "5" "6" "7" "8" "9"
$group_3
[1] "10" "11" "12" "13" "14" "15" "16"
$group_4
[1] "17" "18" "19" "20" "21" "22"
require(data.table)
# Loading required package: data.table
# data.table 1.14.2 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
gc <- function(input_ob = original_groups, numgroups = length(input_ob), numobs = 2) {
ansdim <- NULL
for (k in 1:numgroups) {
rowcount <- 1
temp <- as.vector(input_ob[[k]]) # temporary vector for group k
newcombs <- data.frame(combn(temp, numobs)) # data frame of combinations for group k
newcombs <- transpose(newcombs)
newname <- paste0("ansnew", k)
ansdim[k] <- dim(newcombs)[1]
assign(newname, newcombs)
}
### To hold final answer
nm <- matrix(data = 0, nrow = prod(ansdim), ncol = numobs * numgroups, byrow = T)
## All possible combinations of the first two groups
combine <- NULL
nc <- ansdim[1]
nc <- data.frame(CJ(1:nc, 1:ansdim[2])) # instructions for which row numbers to combine
ntemp <- matrix(data = 0, nrow = ansdim[1] * ansdim[2], ncol = numobs * 2, byrow = T)
for (m in 1:dim(ntemp)[1]) {
newrow <- cbind(ansnew1[nc[m, 1], ], ansnew2[nc[m, 2], ])
ntemp[m, ] <- as.matrix(newrow[1, ])
}
fcom <- ntemp ### all combinations of first two groups
## All possible combinations of all groups
for (n in 3:(numgroups)) {
nc <- ansdim[n]
nc <- data.frame(CJ(1:nrow(fcom), 1:ansdim[n])) # instructions for which row numbers to combine
ntemp <- matrix(data = 0, nrow = nrow(fcom) * ansdim[n], ncol = numobs * (n), byrow = T)
for (p in 1:dim(ntemp)[1]) {
frow <- fcom[nc[p, 1], ] ### First part of new row
srow <- as.character(as.vector(get(paste0("ansnew", n))[nc[p, 2], ])) ## second part of new row
newrow <- c(frow, srow)
ntemp[p, ] <- newrow
}
fcom <- ntemp
}
#
nm <- fcom
return(nm)
}
ans1 <- gc(input_ob = original_groups)
ans2 <- gc(input_ob = original_groups2)
ans3 <- gc(input_ob = original_groups3)
dim(ans1);head(ans1)
[1] 6300 6
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "1" "2" "7" "8" "13" "14"
[2,] "1" "2" "7" "8" "13" "15"
[3,] "1" "2" "7" "8" "13" "16"
[4,] "1" "2" "7" "8" "13" "17"
[5,] "1" "2" "7" "8" "13" "18"
[6,] "1" "2" "7" "8" "13" "19"
dim(ans2);head(ans2)
[1] 37800 8
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "1" "2" "7" "8" "13" "14" "21" "22"
[2,] "1" "2" "7" "8" "13" "14" "21" "23"
[3,] "1" "2" "7" "8" "13" "14" "21" "24"
[4,] "1" "2" "7" "8" "13" "14" "22" "23"
[5,] "1" "2" "7" "8" "13" "14" "22" "24"
[6,] "1" "2" "7" "8" "13" "14" "23" "24"
dim(ans3);head(ans3)
[1] 18900 8
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "1" "2" "5" "6" "10" "11" "17" "18"
[2,] "1" "2" "5" "6" "10" "11" "17" "19"
[3,] "1" "2" "5" "6" "10" "11" "17" "20"
[4,] "1" "2" "5" "6" "10" "11" "17" "21"
[5,] "1" "2" "5" "6" "10" "11" "17" "22"
[6,] "1" "2" "5" "6" "10" "11" "18" "19"
###Sample random rows from output to illustrate output is consistent
ans1[sample(nrow(ans1), size = 5, replace = F),]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "1" "3" "8" "12" "13" "17"
[2,] "3" "6" "8" "12" "19" "20"
[3,] "1" "3" "7" "11" "15" "19"
[4,] "1" "2" "9" "11" "17" "20"
[5,] "5" "6" "7" "8" "19" "20"
ans2[sample(nrow(ans2), size = 5, replace = F),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "2" "3" "8" "11" "13" "15" "22" "24"
[2,] "1" "2" "8" "9" "13" "19" "23" "24"
[3,] "1" "3" "11" "12" "13" "15" "21" "22"
[4,] "1" "4" "9" "10" "14" "16" "21" "22"
[5,] "2" "5" "7" "9" "15" "20" "22" "24"
ans3[sample(nrow(ans3), size = 5, replace = F),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "3" "4" "6" "7" "10" "15" "21" "22"
[2,] "1" "3" "5" "8" "12" "14" "18" "22"
[3,] "2" "4" "7" "8" "12" "16" "18" "19"
[4,] "2" "3" "5" "9" "11" "13" "17" "21"
[5,] "1" "2" "7" "8" "14" "16" "17" "18"
##shows that the output has no duplicated rows
ans1[duplicated(ans1),]
[,1] [,2] [,3] [,4] [,5] [,6]
為了令人信服地表明所有案例都在輸出中表示,我編寫了幾行代碼,從 OP 的示例組中隨機抽取兩個對象,並在輸出中找到相應的行。 這段代碼很容易被循環。
#The next lines sample from original group and find corresponding row in answer
samp1 = sample(original_groups$group_1,size = 2, replace = F)
samp2 = sample(original_groups$group_2,size = 2, replace = F)
samp3 = sample(original_groups$group_3,size = 2, replace = F)
(samprow = c(samp1, samp2, samp3))
[1] "6" "4" "8" "9" "20" "18"
colnames(ans1) = c("v1","v2","v3","v4","v5","v6")
ans5 = data.frame(ans1)
ans5[ans5$v1 %in% samprow & ans5$v2 %in% samprow &ans5$v3 %in% samprow &ans5$v4 %in% samprow &ans5$v5 %in% samprow &ans5$v6 %in% samprow,]
v1 v2 v3 v4 v5 v6
5627 4 6 8 9 18 20
2022 年 7 月 16 日更新:生成所有組合
以下方法是針對 3 組的情況編寫的。 代碼需要適度修改才能以用戶想要的任何形式輸出數據。 可以更改它以適應其他組大小,定期將數據轉儲到輸出文件中以避免大小問題等,但為了簡潔起見,我不會在這里這樣做。 粗略計算表明,對於“original_groups”中的示例數據,對於相當快的筆記本電腦來說,它應該在大約兩個小時或更短的時間內工作。 由於這個原因,它已被注釋掉。
每個唯一的采樣集(gc() 的輸出)在代碼中由一個整數表示,因此不必一遍又一遍地復制或重復它們。 這有助於緩解數據大小問題。
下面的代碼編寫了一個 data.table ,其中每一行給出了 3 個整數,對應於三個唯一的采樣集,就像一組指令,可用於以用戶想要的任何形式(列表等)構造最終結果代碼。
運行代碼約 15 分鍾后,答案框架在去除重復項后收集了 697,152 個組合。 重復刪除代碼取自此處。 所有其他代碼都是原始的。 代碼和示例輸出如下所示。
參考:
Stack Overflow 評論員 Thomas (2014)。 於 2022 年 7 月 16 日訪問。 URL: 使用 R 刪除反向重復項
# inans = ans1 #object from gc() function with all combinations
# inans = data.table(inans)
# colvec = colnames(inans)
# inans$index = 1:nrow(inans)
#
# for(i in 1:nrow(inans)){ #For every possible sampled sequence
# sf = NULL;sf2 = NULL;ntemp = NULL; nin = NULL
# ntemp = as.character(as.vector(inans[i,..colvec]))
# nin = inans$index[i]
# sf = inans
# for(k in colvec){ #Remove rows that don't match first element
# sf = sf[!get(k) %in% ntemp]
# }
#
# ntemp2 = NULL; sin = NULL
# for(j in 1:nrow(sf)){ #Loop through all possibilities for element
# ntemp2 = as.character(as.vector(sf[j,..colvec]))
# sin = sf$index[j]
# sf2 = sf
# for(m in colvec){
# sf2 = sf2[!get(m) %in% ntemp2] #Remove rows that don't match second element
# }
# if(dim(sf2)[1] > 0){
# newframe = data.table(nin, sin, sf2$index) #Store row numbers for each combination
# if(i == 1){
# ansframe = newframe
# }else{
# ansframe = rbind(ansframe, newframe) #start process over for next sampled set
# }}}}
#
# ##Remove duplicate groups
# ##Code Source:
# ##https://stackoverflow.com/questions/22756392/deleting-reversed-duplicates-with-r
# ansframe = ansframe[!duplicated(apply(ansframe,1,function(x) paste(sort(x),collapse=''))),]
#
# #input data.table of all possible sample sets
# #6300 total here - made from gc() function above
# #index is an integer to represent each sequence
# head(inans)
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 14 1
# 2: 1 2 7 8 13 15 2
# 3: 1 2 7 8 13 16 3
# 4: 1 2 7 8 13 17 4
# 5: 1 2 7 8 13 18 5
# 6: 1 2 7 8 13 19 6
#
#
# #data.table that holds instructions to build final output
# #each row gives row numbers for a feasible combination
# head(ansframe)
# nin sin V3
# 1: 1 6300 4046
# 2: 1 6300 4047
# 3: 1 6300 4048
# 4: 1 6300 4051
# 5: 1 6300 4052
# 6: 1 6300 4055
#
# #answer frame size: each row represents a combination
# dim(ansframe)
# [1] 697152 3
#
#
# ###Code to print a selection of valid combinations from the
# ###output instructions given by the code above.
# forind = as.integer(seq(1,nrow(ansframe),length.out = 50))
# for(i in forind){print(inans[as.numeric(ansframe[i])])}
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 14 1
# 2: 5 6 11 12 19 20 6300
# 3: 3 4 9 10 15 16 4046
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 4 9 11 16 17 4079
# 3: 5 6 10 12 14 20 6257
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 4 9 12 17 20 4113
# 3: 5 6 10 11 18 19 6242
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 4 10 12 14 19 4156
# 3: 5 6 9 11 16 18 6180
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 4 11 12 17 18 4195
# 3: 5 6 9 10 16 20 6154
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 5 9 11 14 17 4490
# 3: 4 6 10 12 16 18 5844
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 5 9 12 16 19 4529
# 3: 4 6 10 11 17 18 5819
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 5 10 11 18 20 4563
# 3: 4 6 9 12 17 19 5792
2022 年 7 月 17 日更新:具有可變輸出組大小的所有組合
以下方法是針對 2、3 或 4 組的情況編寫的。 在測試用例中,兩個組的代碼運行大約 15 分鍾左右,但 4 組可能需要更長的時間,具體取決於輸入組的大小和數量。
此示例顯示 4 個輸入組,其中 4 個輸出組和每個組中的 1 個對象,但用戶也可以選擇 2 或 3 個輸出組。 gc()
函數中的對象“numobs”指定要從每個組中獲取的對象,“groupsout”對象指定輸出組的數量。 對於這個例子,下面的代碼在 10 分鍾內找到了 13,824 個組合。
(original_groups4 <- list(group_1 = as.character(1:4), group_2 = as.character(5:8), group_3 = as.character(9:12), group_4 = as.character(13:16)))
$group_1
[1] "1" "2" "3" "4"
$group_2
[1] "5" "6" "7" "8"
$group_3
[1] "9" "10" "11" "12"
$group_4
[1] "13" "14" "15" "16"
ans1 = gc(input_ob = original_groups4, numobs = 1)
###Find all combinations
###groupsout specifies the number of output groups
groupsout = 4 #Number of groups in the output: choose 2,3, or 4
inans = ans1 #dev object from gc() function with all combos
inans = data.table(inans) #order data.table by first few columns
colvec = colnames(inans)
inans$index = 1:nrow(inans)
if(groupsout == 2){ #groupsout = 2 case
for(i in 1:nrow(inans)){ #Remove rows that don't match first element
ntemp = as.character(as.vector(inans[i,..colvec]))
nin = inans$index[i]
sf = inans
for(k in colvec){
sf = sf[!get(k) %in% ntemp]
}
if(dim(sf)[1] > 0){
newframe = data.table(nin, sf$index) #Store row numbers for each combination
if(i == 1){
ansframe = newframe
}else{
ansframe = rbind(ansframe, newframe)
}}}}
if(groupsout == 3){ #groupsout = 3 case
for(i in 1:nrow(inans)){ #Remove rows that don't match first element
sf = NULL;sf2 = NULL;ntemp = NULL; nin = NULL
ntemp = as.character(as.vector(inans[i,..colvec]))
nin = inans$index[i]
sf = inans
for(k in colvec){
sf = sf[!get(k) %in% ntemp]
}
ntemp2 = NULL; sin = NULL #Create last group and write to output frame
for(j in 1:nrow(sf)){ #Remove rows that don't match second element
ntemp2 = as.character(as.vector(sf[j,..colvec]))
sin = sf$index[j]
sf2 = sf
for(m in colvec){
sf2 = sf2[!get(m) %in% ntemp2]
}
if(dim(sf2)[1] > 0){
newframe = data.table(nin, sin, sf2$index) #Store row numbers for each combination
if(i == 1){
ansframe = newframe
}else{
ansframe = rbind(ansframe, newframe)
}}}}}
###groupsout = 4 case
if(groupsout == 4){
for(i in 1:nrow(inans)){ #Remove rows that don't match first element
sf = NULL;sf2 = NULL;sf3 = NULL
ntemp = NULL; nin = NULL;fin = NULL
ntemp = as.character(as.vector(inans[i,..colvec]))
nin = inans$index[i]
sf = inans
for(k in colvec){
sf = sf[!get(k) %in% ntemp]
}
####Create next group
ntemp2 = NULL; sin = NULL
for(j in 1:nrow(sf)){ #Remove rows that don't match second element
ntemp2 = as.character(as.vector(sf[j,..colvec]))
sin = sf$index[j]
sf2 = sf
for(m in colvec){
sf2 = sf2[!get(m) %in% ntemp2]
}
####Create last group and write to output frame
fin = NULL;ntemp3 = NULL
for(p in 1:nrow(sf2)){ #Remove rows that don't match third element
ntemp3 = as.character(as.vector(sf2[p,..colvec]))
fin = sf2$index[p]
sf3 = sf2
for(r in colvec){
sf3 = sf3[!get(r) %in% ntemp3]
}
if(dim(sf3)[1] > 0){
newframe = data.table(nin, sin,fin, sf3$index) #Store row numbers
if(i == 1){
ansframe = newframe
}else{
ansframe = rbind(ansframe, newframe)
}}}}}}#end if groupsout = 4
##Remove duplicate groups
ansframe = ansframe[!duplicated(apply(ansframe,1,function(x) paste(sort(x),collapse=''))),]
#for illustration
dim(ansframe)
[1] 13824 4
##View subset of combination output to check for valid results
forind = as.integer(seq(1,nrow(ansframe),length.out = 50))
for(i in forind){print(inans[as.numeric(ansframe[i])])}
V1 V2 V3 V4 index
1: 1 5 9 13 1
2: 4 8 12 16 256
3: 3 7 11 15 171
4: 2 6 10 14 86
V1 V2 V3 V4 index
1: 1 5 9 15 3
2: 2 6 12 16 96
3: 3 7 10 14 166
4: 4 8 11 13 249
V1 V2 V3 V4 index
1: 1 5 9 16 4
2: 2 7 12 14 110
3: 3 6 11 15 155
4: 4 8 10 13 245
V1 V2 V3 V4 index
1: 1 5 10 13 5
2: 2 8 12 14 126
3: 3 7 9 16 164
4: 4 6 11 15 219
V1 V2 V3 V4 index
1: 1 5 10 15 7
2: 2 6 11 16 92
3: 3 8 12 14 190
4: 4 7 9 13 225
V1 V2 V3 V4 index
1: 1 5 10 16 8
2: 2 7 11 15 107
3: 3 6 9 14 146
4: 4 8 12 13 253
V1 V2 V3 V4 index
1: 1 5 11 13 9
2: 2 8 10 15 119
3: 3 6 12 16 160
4: 4 7 9 14 226
我建議這個解決方案:它不使用除base
之外的任何其他庫。
permutations
函數,以計算向量(或列表)的元素的所有可能組合vec
一次采用sublen
元素( combn
函數)permutations <- function(vec, sublen, prev_vec=NULL){
out_list <- list()
if(sublen==1){
for(v in vec){
out_list <- append(out_list,list(append(prev_vec,list(v))))
}
} else {
for (i in 1:(length(vec)-sublen+1)){
v <- vec[1]
prev_vec0 <- c(prev_vec,vec[1])
vec <- vec[2:length(vec)]
perm_list <- permutations(
vec=vec,
sublen=sublen-1,
prev_vec=prev_vec0
)
out_list <- append(out_list,perm_list)
}
}
return(out_list)
}
find_matrix
函數,從深度嵌套列表中取消列出矩陣( 源)find_matrix <- function(x) {
if (is.matrix(x))
return(list(x))
if (!is.list(x))
return(NULL)
unlist(lapply(x, find_matrix), FALSE)
}
compatible_rows
函數,它從數據幀中提取行的子集,給定一個輸出向量,該子集可用於創建其他輸出向量compatible_rows <- function(df,row_value){
row_ids <- c()
if(is.null(nrow(df))){
return(NULL)
} else {
for (row_id in 1:nrow(df)){
row_ids <- c(row_ids,!any(row_value %in% df[row_id,]))
}
return(df[which(row_ids),])
}
}
new_groups_list
函數,計算所有可能的輸出矩陣new_groups_list <- function(df, prev_df=NULL, lvl=-1, verbose=F){
lvl <- lvl+1
results_list <- list()
if(is.null(nrow(df))){
if(verbose==T) cat(paste0("--BRANCH END (BEGIN lvl ",lvl,")--\n"))
prev_df0 <- rbind(prev_df,df)
rownames(prev_df0) <- NULL
if(verbose==T) cat("returned\n")
if(verbose==T) print(prev_df0)
if(verbose==T) cat("\n")
return(prev_df0)
}
if(nrow(df)==0){
if(verbose==T) cat(paste0("--BRANCH END (BEGIN lvl ",lvl,")--\n"))
prev_df0 <- rbind(prev_df,df)
rownames(prev_df0) <- NULL
if(verbose==T) cat("returned\n")
if(verbose==T) print(prev_df0)
if(verbose==T) cat("\n")
return(prev_df0)
}
for(row_id in 1:nrow(df)){
if(verbose==T) cat(paste("-- lvl",lvl,"cycle",row_id,"--\n"))
if(verbose==T) cat("initial results list\n")
if(verbose==T) print(results_list)
if(verbose==T) cat("\n")
if(verbose==T) cat("df in\n")
if(verbose==T) assign("last_df",df,envir = .GlobalEnv)
if(verbose==T) print(df)
if(verbose==T) cat("\n")
if(is.null(nrow(df))){
prev_df0 <- rbind(prev_df,df)
rownames(prev_df0) <- NULL
if(verbose==T) cat(paste0("--BRANCH END (MID lvl ",lvl,")--\n"))
if(verbose==T) cat("returned\n")
results_list <- append(results_list,list(prev_df0))
if(verbose==T) print(results_list)
if(verbose==T) cat("\n")
return(results_list)
}
considered_row <- df[1,]
if(verbose==T) assign("last_considered_row",considered_row,envir = .GlobalEnv)
if(verbose==T) cat("considered rows\n")
if(verbose==T) print(considered_row)
if(verbose==T) cat("\n")
df <- df[2:nrow(df),]
if(verbose==T) assign("last_df",df,envir = .GlobalEnv)
if(verbose==T) cat("df without considered rows\n")
if(verbose==T) print(df)
if(verbose==T) cat("\n")
prev_df0 <- rbind(prev_df,considered_row)
rownames(prev_df0) <- NULL
if(verbose==T) assign("last_prev0",prev_df0,envir = .GlobalEnv)
if(verbose==T) cat("collected considered rows\n")
if(verbose==T) print(prev_df0)
if(verbose==T) cat("\n")
comp_df <- compatible_rows(df,considered_row)
if(verbose==T) assign("last_comp_df",comp_df,envir = .GlobalEnv)
if(verbose==T) cat("compatible rows in df\n")
if(verbose==T) print(comp_df)
if(verbose==T) cat("\n")
if(verbose==T) cat(paste(">>> GOING TO LVL",lvl+1,"\n\n"))
new_rows <- new_groups_list(
comp_df,
prev_df=prev_df0,
lvl=lvl,
verbose=verbose
)
if(verbose==T) cat(paste0("--ROOT (lvl ",lvl,")--\n"))
if(verbose==T) cat("result received from branch\n")
if(verbose==T) print(new_rows)
if(verbose==T) cat("\n")
results_list <- append(results_list,list(new_rows))
if(verbose==T) cat("results list\n")
if(verbose==T) print(results_list)
if(verbose==T) cat("\n")
}
return(results_list)
}
create_new_groups
,它包含所有其他功能,並輸出可能解決方案的整個列表create_new_groups <- function(original_groups, max_output = NULL){
min_len_original_groups = min(lengths(original_groups))
num_original_groups = length(original_groups)
max_len_subgroup <- floor(min_len_original_groups/2)
if(min_len_original_groups<2){
return("Not possible to populate new groups: at least one original group has less than 2 elements")
}
NewGroups_subLen_len_num <- list()
for (len_subgroup in 1:max_len_subgroup){
new_group_params <- c(
len_subgroup,
len_subgroup*num_original_groups,
floor(min_len_original_groups/len_subgroup)
)
NewGroups_subLen_len_num[[len_subgroup]] <- new_group_params
}
out_list <- list()
ind <- 1
for (e in 1:length(NewGroups_subLen_len_num)){
NewGroup_subLen_len_num <- NewGroups_subLen_len_num[[e]]
elem_list <- list()
ind <- 1
# print(ind)
for (o in 1:length(original_groups)){
original_group <- original_groups[[o]]
elem_list[[paste("group",ind)]] <- permutations(original_group,NewGroup_subLen_len_num[1])
ind <- ind+1
}
out_list[[paste(c("subLen","len","numGroups"), NewGroup_subLen_len_num, collapse = " ")]] <- elem_list
}
results_list <- list()
config_test <<- NewGroups_subLen_len_num
for (config_id in 1:length(NewGroups_subLen_len_num)){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
# config_test <<- config
new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
# out_test <<- new_groups
results_list[[config_id]] <- new_groups
}
return(results_list)
}
給定一個簡單的輸入,例如
original_groups <- list(
group_1 = as.character(1:2),
group_2 = as.character(3:4),
group_3 = as.character(5:7)
)
create_new_groups(original_groups)
的輸出是
> create_new_groups_modified(original_groups)
[[1]]
[[1]][[1]]
[,1] [,2] [,3]
[1,] "1" "3" "5"
[2,] "2" "4" "6"
[[1]][[2]]
[,1] [,2] [,3]
[1,] "1" "3" "5"
[2,] "2" "4" "7"
[[1]][[3]]
[,1] [,2] [,3]
[1,] "2" "3" "5"
[2,] "1" "4" "6"
[[1]][[4]]
[,1] [,2] [,3]
[1,] "2" "3" "5"
[2,] "1" "4" "7"
[[1]][[5]]
[,1] [,2] [,3]
[1,] "1" "4" "5"
[2,] "2" "3" "6"
[[1]][[6]]
[,1] [,2] [,3]
[1,] "1" "4" "5"
[2,] "2" "3" "7"
[[1]][[7]]
[,1] [,2] [,3]
[1,] "2" "4" "5"
[2,] "1" "3" "6"
[[1]][[8]]
[,1] [,2] [,3]
[1,] "2" "4" "5"
[2,] "1" "3" "7"
[[1]][[9]]
[,1] [,2] [,3]
[1,] "1" "3" "6"
[2,] "2" "4" "7"
[[1]][[10]]
[,1] [,2] [,3]
[1,] "2" "3" "6"
[2,] "1" "4" "7"
[[1]][[11]]
[,1] [,2] [,3]
[1,] "1" "4" "6"
[2,] "2" "3" "7"
[[1]][[12]]
[,1] [,2] [,3]
[1,] "2" "4" "6"
[2,] "1" "3" "7"
此外, create_new_groups
函數還創建了一個全局變量config_test
,其中存儲了給定組列表(即original_groups
)的所有可能配置。 例如,對於上一個問題, config_test
等於
> config_test
[[1]]
[1] 1 3 2
因此,對於這個問題,只有一種輸出配置是可能的,具有以下結構:
給出一個稍微復雜一點的例子
original_groups <- list(
group_1 = as.character(1:4),
group_2 = as.character(5:8),
group_3 = as.character(9:13)
)
config_test
將等於
> config_test
[[1]]
[1] 1 3 4
[[2]]
[1] 2 6 2
我做了一些測試,這種方法應該適用於任意數量、任意長度的組,並且輸出應該始終由不重復的矩陣組成。
如果解釋很短,我很抱歉,如果我在接下來的幾天有時間,我會嘗試添加一些注釋。
編輯
僅輸出以原始組中特定數量元素為特征的配置的一種簡單方法是將create_new_groups
更改如下
create_new_groups_modified <- function(original_groups, max_output = NULL, elements_from_original = NULL){
min_len_original_groups = min(lengths(original_groups))
num_original_groups = length(original_groups)
max_len_subgroup <- floor(min_len_original_groups/2)
if(min_len_original_groups<2){
stop("Not possible to populate new groups: at least one original group has less than 2 elements")
}
NewGroups_subLen_len_num <- list()
for (len_subgroup in 1:max_len_subgroup){
new_group_params <- c(
len_subgroup,
len_subgroup*num_original_groups,
floor(min_len_original_groups/len_subgroup)
)
NewGroups_subLen_len_num[[len_subgroup]] <- new_group_params
}
out_list <- list()
ind <- 1
for (e in 1:length(NewGroups_subLen_len_num)){
NewGroup_subLen_len_num <- NewGroups_subLen_len_num[[e]]
elem_list <- list()
ind <- 1
# print(ind)
for (o in 1:length(original_groups)){
original_group <- original_groups[[o]]
elem_list[[paste("group",ind)]] <- permutations(original_group,NewGroup_subLen_len_num[1])
ind <- ind+1
}
out_list[[paste(c("subLen","len","numGroups"), NewGroup_subLen_len_num, collapse = " ")]] <- elem_list
}
results_list <- list()
config_test <<- NewGroups_subLen_len_num
# if `elements_from_original` is not set, output all possible combinations
if(is.null(elements_from_original)){
for (config_id in 1:length(NewGroups_subLen_len_num)){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
# config_test <<- config
new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
# out_test <<- new_groups
results_list[[config_id]] <- new_groups
}
} else {
# if `elements_from_original` is set, check if this is a valid configuration, then output only the matrix having this configuration
config_id <- which(sapply(NewGroups_subLen_len_num,function(x) x[1]==elements_from_original))
if (length(config_id)!=0){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
results_list[[1]] <- new_groups
} else {
stop("value of elements_from_original not available: check config_test to see available configurations")
}
}
return(results_list)
}
該函數的elements_from_original
參數允許設置要考慮的原始組中的元素數量,並且如果適用,輸出將僅包含該配置之后的矩陣。
編輯 2
輸出由特定數量的組組成的矩陣
select_matrices_by_number_output_groups
,它只輸出具有n_output_groups
行的矩陣select_matrices_by_number_output_groups <- function(l,n_output_groups){
# Filter out matrices having less rows than `n_output_groups`
out_l <- l[which(
sapply(
l,
# function(x) check_matrix_by_number_output_groups(x,n_output_groups)
function(mtr){
if(nrow(mtr)<n_output_groups) return(F)
else return(T)
}
)
)]
# Cut-off rows from matrices having more rows than `n_output_groups`
out_l <- lapply(
out_l,
function(x) head(x,n_output_groups)
)
# Keep only unique elements (i.e., matrices)
out_l <- unique(out_l)
return(out_l)
}
create_new_groups
使其包含select_matrices_by_number_output_groups
函數create_new_groups_modified_2 <- function(original_groups, max_output = NULL, elements_from_original = NULL, n_output_groups = NULL){
min_len_original_groups = min(lengths(original_groups))
num_original_groups = length(original_groups)
max_len_subgroup <- floor(min_len_original_groups/2)
if(min_len_original_groups<2){
stop("Not possible to populate new groups: at least one original group has less than 2 elements")
}
NewGroups_subLen_len_num <- list()
for (len_subgroup in 1:max_len_subgroup){
new_group_params <- c(
len_subgroup,
len_subgroup*num_original_groups,
floor(min_len_original_groups/len_subgroup)
)
NewGroups_subLen_len_num[[len_subgroup]] <- new_group_params
}
out_list <- list()
ind <- 1
for (e in 1:length(NewGroups_subLen_len_num)){
NewGroup_subLen_len_num <- NewGroups_subLen_len_num[[e]]
elem_list <- list()
ind <- 1
# print(ind)
for (o in 1:length(original_groups)){
original_group <- original_groups[[o]]
elem_list[[paste("group",ind)]] <- permutations(original_group,NewGroup_subLen_len_num[1])
ind <- ind+1
}
out_list[[paste(c("subLen","len","numGroups"), NewGroup_subLen_len_num, collapse = " ")]] <- elem_list
}
results_list <- list()
config_test <<- NewGroups_subLen_len_num
# if `elements_from_original` is not set, output all possible combinations
if(is.null(elements_from_original)){
for (config_id in 1:length(NewGroups_subLen_len_num)){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
results_list[[config_id]] <- new_groups
}
} else {
# if `elements_from_original` is set, check if this is a valid configuration, then output only the matrix having this configuration
config_id <- which(sapply(NewGroups_subLen_len_num,function(x) x[1]==elements_from_original))
if (length(config_id)!=0){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
new_groups <- lapply(
new_groups,
function(x) {
dimnames(x) <- NULL
return(x)
}
)
if(is.null(n_output_groups)){
new_groups <- new_groups[which(sapply(new_groups, nrow) == config[3])]
} else if (n_output_groups > config[3]){
stop("value n_output_groups higher than max number of new groups for this configuration: check config_test to see available configurations")
} else {
new_groups <- select_matrices_by_number_output_groups(new_groups,n_output_groups)
}
# results_list[[1]] <- new_groups
results_list <- new_groups
} else {
stop("value of elements_from_original not available: check config_test to see available configurations")
}
}
return(results_list)
}
這是expand.grid
+ combn
的工作:將只顯示前 5 行:
n <- 1
expand.grid(lapply(original_groups, combn, n, simplify = FALSE))
group_1 group_2 group_3
1 1 7 13
2 2 7 13
3 3 7 13
4 4 7 13
5 5 7 13
當 n = 2
n <- 2
expand.grid(lapply(original_groups, combn, n, simplify = FALSE))
group_1 group_2 group_3
1 1, 2 7, 8 13, 14
2 1, 3 7, 8 13, 14
3 1, 4 7, 8 13, 14
4 1, 5 7, 8 13, 14
5 1, 6 7, 8 13, 14
你可以寫一個簡單的函數:
generate_all <- function(lst, n){
expand.grid(lapply(lst, combn, n, simplify = FALSE))
}
head(generate_all(original_groups, 3))
group_1 group_2 group_3
1 1, 2, 3 7, 8, 9 13, 14, 15
2 1, 2, 4 7, 8, 9 13, 14, 15
3 1, 2, 5 7, 8, 9 13, 14, 15
4 1, 2, 6 7, 8, 9 13, 14, 15
5 1, 3, 4 7, 8, 9 13, 14, 15
head(generate_all(original_groups, 4))
group_1 group_2 group_3
1 1, 2, 3, 4 7, 8, 9, 10 13, 14, 15, 16
2 1, 2, 3, 5 7, 8, 9, 10 13, 14, 15, 16
3 1, 2, 3, 6 7, 8, 9, 10 13, 14, 15, 16
4 1, 2, 4, 5 7, 8, 9, 10 13, 14, 15, 16
5 1, 2, 4, 6 7, 8, 9, 10 13, 14, 15, 16
6 1, 2, 5, 6 7, 8, 9, 10 13, 14, 15, 16
如果我正確理解了您的問題,那么對於給定大小的每個組的所有可能分區並重新組織所有組中的分區以形成新集合並僅保留一個 isomorphics 似乎是一個問題。 在這種情況下,我猜關鍵步驟是按大小生成所有獨占分區,這似乎與排列問題有關。
由於基本 R選項比 OP 更可取,也許我們可以嘗試下面的代碼:
permM
,它生成給定組大小M
的向量x
的所有排列f
以產生所需的輸出,即新組的所有可能組合,其中所有組合都存儲在嵌套列表中# generate all permuations of x with given size M for each group
permM <- function(x, M) {
if (length(x) == M) {
return(list(x))
}
S <- combn(x, M, simplify = FALSE)
res <- c()
for (k in seq_along(S)) {
z <- Recall(x[!x %in% S[[k]]], M)
res <- c(res, lapply(z, c, S[[k]]))
}
res
}
# create all possible combinations of new groups
f <- function(lst, K) {
nms <- names(lst)
l <- lapply(lst, combn, m = length(lst) * K)
g <- apply(
expand.grid(lapply(choose(lengths(lst), length(lst) * K), seq)),
1,
function(idx) {
Map(function(p, q) l[[p]][, q], seq_along(idx), unlist(idx))
}
)
x <- do.call(
c,
lapply(
g,
function(v) {
apply(
expand.grid(lapply(v, permM, M = K)),
1,
function(...) {
setNames(
asplit(do.call(rbind, lapply(..., matrix, K)), 2),
nms
)
}
)
}
)
)
# remove the isomorphics but keep one of them only
x[
!duplicated(lapply(
x,
function(v) {
unname(sort(sapply(v, function(z) toString(sort(z)))))
}
))
]
}
給定一個較小的數據樣本lst <- list(grp1 = 1:4, grp2 = 5:9)
作為original_group
列表,我們運行
r1 <- f(lst,1)
r2 <- f(lst,2)
我們將看到如下結果的快照
> head(r1)
[[1]]
[[1]]$grp1
[1] 2 6
[[1]]$grp2
[1] 1 5
[[2]]
[[2]]$grp1
[1] 1 6
[[2]]$grp2
[1] 2 5
[[3]]
[[3]]$grp1
[1] 3 6
[[3]]$grp2
[1] 1 5
[[4]]
[[4]]$grp1
[1] 1 6
[[4]]$grp2
[1] 3 5
[[5]]
[[5]]$grp1
[1] 4 6
[[5]]$grp2
[1] 1 5
[[6]]
[[6]]$grp1
[1] 1 6
[[6]]$grp2
[1] 4 5
和
> head(r2)
[[1]]
[[1]]$grp1
[1] 3 4 7 8
[[1]]$grp2
[1] 1 2 5 6
[[2]]
[[2]]$grp1
[1] 2 4 7 8
[[2]]$grp2
[1] 1 3 5 6
[[3]]
[[3]]$grp1
[1] 2 3 7 8
[[3]]$grp2
[1] 1 4 5 6
[[4]]
[[4]]$grp1
[1] 1 4 7 8
[[4]]$grp2
[1] 2 3 5 6
[[5]]
[[5]]$grp1
[1] 1 3 7 8
[[5]]$grp2
[1] 2 4 5 6
[[6]]
[[6]]$grp1
[1] 1 2 7 8
[[6]]$grp2
[1] 3 4 5 6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.