简体   繁体   中英

Finding feasible combinations in dataframe R / combinatorics

I have the following challenge: dataframe with 218 observations (rows) and 218 variables (cols). The values are either TRUE or FALSE. Now i need to find combinations of variables (cols) that appear (TRUE) in at least 2 rows.

Here is a little example:

data <- data.frame(matrix(FALSE, nrow = 3, ncol = 5))
colnames(data) = paste("item_", 1:5, sep = "")
rownames(data) = paste("Process_", 1:3, sep = "")
data["Process_1",c("item_1","item_2","item_3")] = TRUE
data["Process_2",c("item_2","item_3")] = TRUE
data["Process_3",c("item_1","item_2","item_3","item_4","item_5")] = TRUE

For the example the feasible combinations (or the goal to find out) are the following combinations:

c1: item1,item2,item3

c2: item2,item3

c3: item1, item2

c4: item1, item3

Thank you very much for an answer or a hint:)

Cheers

#all items that have TRUE in 2 or more rows
items <- names(which(colSums(data) >= 2))
# all possible combinations of 2 (or more) items
lapply(2:length(items), function(x) combn(items, x)
# [[1]]
#          [,1]     [,2]     [,3]    
# [1,] "item_1" "item_1" "item_2"
# [2,] "item_2" "item_3" "item_3"
# 
# [[2]]
#          [,1]    
# [1,] "item_1"
# [2,] "item_2"
# [3,] "item_3"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM