简体   繁体   中英

R: grep exactly using a character vector with multiple patterns

How to match t1 with t2 using the grep function?

t1

t1 %>% head()

[1] "ITGB4"   "GPER1"   "FAM162A" "S100A2"  "MBNL1"   "RNASE11"

t2

t2 %>% head(10)

 [1] ""                                               
 [2] ""                                               
 [3] "RP1-45C12.1;RP1-127D3.4;RP1-127D3.4;RP1-127D3.4"
 [4] "PRKAG2;PRKAG2;PRKAG2"                           
 [5] ""                                               
 [6] "AC022201.4"                                     
 [7] "TLK1"                                           
 [8] ""                                               
 [9] ""                                               
 [10] ""        

I tried grep(paste(t1,sep = "", collapse = "|"), t2, value = T) %>% unique() , but the outputs are some gene symbles that are not in t1 or not exactly the same as the gen symbles in t1.

Any good ideas about how to match t1 and t2?

You have to create all versions of the string. Only this ID, ID at the beginning, ID at the end and ID in the middle...

search_string <- paste0(c('^',';',';','^'), 
       rep(t1, each=4), 
       c('$',';','$',';'), collapse='|')
candidates <- grep(search_string, t2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM