简体   繁体   English

为 r 中的 or 语句折叠向量中的字符串三次

[英]collapse strings in a vector three times for an or statement in r

I have a vector with multiple strings我有一个带有多个字符串的向量

strings <- c("CD4","CD8A")

and I'd like to output an OR statement to be passed to grep like so我想像这样输出一个 OR 语句传递给 grep

"CD4-|-CD4-|-CD4$|CD8A-|-CD8A-|-CD8A$"

and so on for each element in the vector..对向量中的每个元素依此类推..

basically I'm trying to find an exact word in a string that has three dashes in it, (I don't want grep(CD4, ..) to return strings with CD40).基本上,我试图在包含三个破折号的字符串中找到一个确切的单词(我不希望grep(CD4, ..)返回带有 CD40 的字符串)。 This is how I thought of doing it but I'm open to other suggestions这就是我的想法,但我愿意接受其他建议

part of my data.frame looks like this:我的 data.frame 的一部分看起来像这样:

Genes <- as.data.frame(c("CD4-MyD88-IL27RA", "IL2RG-CD4-GHR","MyD88-CD8B-EPOR", "CD8A-IL3RA-CSF3R", "ICOS-CD40-LMP1"))
colnames(Genes) <- "Genes"

Here is a one-liner...这是一个单线...

Genes$Genes[grep(paste0("\\b",strings,"\\b",collapse="|"),Genes$Genes)]

[1] "CD4-MyD88-IL27RA" "IL2RG-CD4-GHR"    "CD8A-IL3RA-CSF3R"

It uses word-boundary markers \\\\b to make sure that it matches complete substrings (as the - does not count as part of a word).它使用单词边界标记\\\\b来确保它匹配完整的子字符串(因为-不计为单词的一部分)。

I don't know if I understood.不知道有没有看懂。 If I got it, the following command will return what you want如果我得到它,下面的命令将返回你想要的

stringr::str_split(Genes$Genes, pattern = '-') %>% 
  purrr::map(
    function(data) {
      data[stringr::str_which(data, pattern = '^CD')]
    }
  )  %>% unlist

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM