简体   繁体   中英

R: Filter vectors by 'two-way' partial match

With two vectors

x <- c("abc", "12")
y <- c("bc", "123", "nomatch")

is there a way to do a filter of both by 'two-way' partial matching (remove elements in one vector if they contain or are contained in any element in the other vector) so that the result are these two vectors:

x1 <- c()
y1 <- c("nomatch")

To explain - every element of x is either a substring or a superstring of one of the elements of y, hence x1 is empty. Update - it is not sufficient for a substring to match the initial chars - a substring might be found anywhere in the string it matches. Example above has been updated to reflect this.

I originally thought ?pmatch might be handy, but your edit clarifies you don't just want to match the start of items. Here's a function that should work:

remover <- function(x,y) {
    pmx <- sapply(x, grep, x=y)
    pmy <- sapply(y, grep, x=x)

    hit <- unlist(c(pmx,pmy))

    list(
        x[!(seq_along(x) %in% hit)],
        y[!(seq_along(y) %in% hit)]
    )
}

remover(x,y)
#[[1]]
#character(0)
#
#[[2]]
#[1] "nomatch"

It correctly does nothing when no match is found (thanks @Frank for picking up the earlier error):

remover("yo","nomatch")
#[[1]]
#[1] "yo"
# 
#[[2]]
#[1] "nomatch"

We can do the following:

# Return data.frame of matches of a in b
m <- function(a, b) {
    data.frame(sapply(a, function(w) grepl(w, b), simplify = F));
}

# Match x and y and remove
x0 <- x[!apply(m(x, y), 2, any)]
y0 <- y[!apply(m(x, y), 1, any)]

# Match y and x and remove
x1 <- x0[!apply(m(y0, x0), 1, any)]
y1 <- y0[!apply(m(y0, x0), 2, any)]

x1;
#character(0)

x2;
#[1] "nomatch"

I build a matrix of all possible matches in both directions, then combine both with | as a match in any direction is equally a match, and then and use it to subset x and y :

x <- c("abc", "12")
y <- c("bc", "123", "nomatch")

bool_mat <- sapply(x,function(z) grepl(z,y)) | t(sapply(y,function(z) grepl(z,x)))
x1 <- x[!apply(bool_mat,2,any)] # character(0)
y1 <- y[!apply(bool_mat,1,any)] # [1] "nomatch"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM