Let's consider vector following and it's combination matrix representation:
string_vec <- c("huge", "small", "small_very", "something", "big_huge", "big_very", "tremendous",
"huge_big", "huge_amount", "small_amout", "else", "something_big", "something_small")
combinations <- utils::combn(string_vec, 6)
And let's define vector
a = c("huge", big")
I want now to restrict my matrix to have at least one element from the family of elements of a
.
ie I want to have at least one element starting with huge
(so those elements are: "huge", "huge_big", "huge_amount"
) and at least one with big
("big_huge", "big_very")
.
Do you have any idea how it can be obtained?
EDIT
I figure out that we can use apply
wiht startsWith
:
combinations <- combinations[,colSums(apply(combinations, 2, function(x) startsWith(x, "huge")))==1]
combinations <- combinations[,colSums(apply(combinations, 2, function(x) startsWith(x, "big")))==1]
But I found this solution highly inefficient, due to the fact that I have to make each step manually for each family (in longer family vector it could be very problematic)
We could paste
the 'a' vector to create a single string, use that as pattern
in grepl
, loop over the columns of the matrix
(either base R
apply
) or dapply
from collapse
(would be more efficient), check if there any
matches in those columns, and subset the columns with the logical output
library(collapse)
out <- combinations[,dapply(combinations, FUN =
function(x) any(grepl(paste(a, collapse='|'), x)), MARGIN = 2)]
If we want to find at least one of the elements from all elements of 'a'
out <- combinations[,dapply(combinations, FUN =
function(x) Reduce(`&`, lapply(a, function(u) any(grepl(u, x)))), MARGIN = 2)]
dim(out)
#[1] 6 1555
If it is to check elements that starts with those words in 'a', paste
with ^
out <- combinations[,dapply(combinations, FUN =
function(x) Reduce(`&`, lapply(paste0('^', a),
function(u) any(grepl(u, x)))), MARGIN = 2)]
dim(out)
#[1] 6 1072
A base R option using combn
(cbs <- combn(string_vec, 6))[
,
colSums(matrix(grepl("^huge", cbs), nrow = 6)) > 0
& colSums(matrix(grepl("^big", cbs), nrow = 6)) > 0
]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.