简体   繁体   中英

Regex R substituting in a vector of replacements with parentheses

Suppose I have a string x like so.

x <- "CTTTANNNNNNNYG"

I would like to replace each letter in x with a different string that may not be f the same length.

a <- c("A","C","G","T","W","S","M","K","R","Y","B","D","H","V","N")
b <- c("A","C","G","T","(A|T)","(C|G)","(A|C)","(G|T)","(A|G)","(C|T)","(C|G|T)","(A|G|T)","(A|C|T)","(A|C|G)","(A|C|G|T)")

If I wanted to replace the letters in vector a with the corresponding ones in vector b, I would want to manipulate string x into:

"CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

I've tried using mapply(gsub, a,b,x) and str_replace() to no avail. Any help would be appreciated.

We can use mgsub from library(qdap)

library(qdap)
mgsub(a, b, x)
#[1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

Since replacements are "fixed" and involve each just one letter, you can achieve the same result without using neither regex nor any additional packages. For instance:

vapply(strsplit(x,"",fixed=TRUE),function(z) paste(setNames(b,a)[z],collapse=""),"")
#[1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

If you wanted to do this with base functions, you need to basically do each of the replacements sequentially ( gsub isn't vectorized in this way). Here's one way to do that

Reduce(
    function(x, replace) {
        gsub(replace$pattern, replace$value, x)
    }, 
    Map(function(a,b) list(pattern=a, value=b), a, b), 
    init=x
)
# [1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

We use Map to make pairs of match/replace values and then sequentially apply them to the string with Reduce

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM