简体   繁体   中英

Find a vector of strings in R

I have a vector of strings like:

vector=c("a","hb","cd")

and also I have a matrix which has a column, each element of this column is a list of strings which separated by "|" separator, like:

1 "ab|hb"

2 "ab|hbc|cd"

I want to find each string of vector appears in which row of matrix completely.

For the above vector, the result is:

NA, 1, 2

You can use strsplit for splitting strings:

x <- strsplit("ab|hbc|cd", split="|", fixed=T)

and then check if values of vector appear in the data, eg

sapply(vector, function(x) x %in% strsplit("a|ab|cd|efg|bh",
                                     split="|", fixed=T)[[1]])

Warning: strsplit outputs data as a list, so in the example above I extract only the first element of the list with [[1]] , however you can deal with it in other way if you choose.

EDIT: answering to your question on data as a vector:

data <- c("ab|cd|ef", "aaa|b", "ab", "wf", "fg|hb|a", "cd|cd|df")

sapply(sapply(data, function(x) strsplit(x, split="|", fixed=T)[[1]]),
  function(y) sapply(vector, function(z) z %in% y))

Here's an approach using regular expressions:

# Example data
vector <- c("a","hb","cd")
mat <- matrix(c("ab|hb", "ab|hbc|cd"), nrow = 2)

sapply(paste0("\\b", vector, "\\b"), function(x)
         if(length(tmp <- grep(x, mat[ , 1]))) tmp else NA,
       USE.NAMES = FALSE)
# [1] NA  1  2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM