简体   繁体   中英

Narrow matrix by conditional restrictions

Let's consider vector following and it's combination matrix representation:

string_vec <- c("huge", "small", "small_very", "something", "big_huge", "big_very", "tremendous", 
                "huge_big", "huge_amount", "small_amout", "else", "something_big", "something_small")
combinations <- utils::combn(string_vec, 6)

And let's define vector

a = c("huge", big")

I want now to restrict my matrix to have at least one element from the family of elements of a .

ie I want to have at least one element starting with huge (so those elements are: "huge", "huge_big", "huge_amount" ) and at least one with big ("big_huge", "big_very") .

Do you have any idea how it can be obtained?

EDIT

I figure out that we can use apply wiht startsWith :

combinations <- combinations[,colSums(apply(combinations, 2, function(x) startsWith(x, "huge")))==1]
combinations <- combinations[,colSums(apply(combinations, 2, function(x) startsWith(x, "big")))==1]

But I found this solution highly inefficient, due to the fact that I have to make each step manually for each family (in longer family vector it could be very problematic)

We could paste the 'a' vector to create a single string, use that as pattern in grepl , loop over the columns of the matrix (either base R apply ) or dapply from collapse (would be more efficient), check if there any matches in those columns, and subset the columns with the logical output

library(collapse)
out <- combinations[,dapply(combinations, FUN = 
    function(x) any(grepl(paste(a, collapse='|'), x)), MARGIN = 2)]

If we want to find at least one of the elements from all elements of 'a'

out <- combinations[,dapply(combinations, FUN = 
    function(x) Reduce(`&`, lapply(a, function(u) any(grepl(u, x)))), MARGIN = 2)]
dim(out)
#[1]    6 1555

If it is to check elements that starts with those words in 'a', paste with ^

out <- combinations[,dapply(combinations, FUN = 
     function(x) Reduce(`&`, lapply(paste0('^', a), 
       function(u) any(grepl(u, x)))), MARGIN = 2)]
dim(out)
#[1]    6 1072

A base R option using combn

(cbs <- combn(string_vec, 6))[
    ,
    colSums(matrix(grepl("^huge", cbs), nrow = 6)) > 0
    & colSums(matrix(grepl("^big", cbs), nrow = 6)) > 0
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM