Suppose I have a string x
like so.
x <- "CTTTANNNNNNNYG"
I would like to replace each letter in x with a different string that may not be f the same length.
a <- c("A","C","G","T","W","S","M","K","R","Y","B","D","H","V","N")
b <- c("A","C","G","T","(A|T)","(C|G)","(A|C)","(G|T)","(A|G)","(C|T)","(C|G|T)","(A|G|T)","(A|C|T)","(A|C|G)","(A|C|G|T)")
If I wanted to replace the letters in vector a with the corresponding ones in vector b, I would want to manipulate string x into:
"CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"
I've tried using mapply(gsub, a,b,x)
and str_replace()
to no avail. Any help would be appreciated.
We can use mgsub
from library(qdap)
library(qdap)
mgsub(a, b, x)
#[1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"
Since replacements are "fixed" and involve each just one letter, you can achieve the same result without using neither regex
nor any additional packages. For instance:
vapply(strsplit(x,"",fixed=TRUE),function(z) paste(setNames(b,a)[z],collapse=""),"")
#[1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"
If you wanted to do this with base functions, you need to basically do each of the replacements sequentially ( gsub
isn't vectorized in this way). Here's one way to do that
Reduce(
function(x, replace) {
gsub(replace$pattern, replace$value, x)
},
Map(function(a,b) list(pattern=a, value=b), a, b),
init=x
)
# [1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"
We use Map
to make pairs of match/replace values and then sequentially apply them to the string with Reduce
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.