简体   繁体   中英

Positive look ahead in R - passing variables

I got stuck in a regular expression. I usually use this line of code to find overlapping repetitions in strings:

gregexpr("(?=ATGGGCT)",text,perl=TRUE)
[[1]]
[1]  16  45  52  75 203 210 266 273 327 364 436 443 480 506 534 570 649
attr(,"match.length")
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
attr(,"useBytes")
[1] TRUE

Now I want to give to gregexpr a pattern contained in a variable:

x="GGC"

and of course if I pass the variable x , gregexpr is going to search "x" and not what the variable contains

gregexpr("(?=x)",text,perl=TRUE)
[[1]]
[1] -1
attr(,"match.length")
[1] -1
attr(,"useBytes")
[1] TRUE

How can I pass my variable to gregexpr in this case of positive look ahead?

I'd play with the sprintf function:

x <- "AGA"
text <- "ACAGAGACTTTAGATAGAGAAGA"
gregexpr(sprintf("(?=%s)", x), text, perl=TRUE)
## [[1]]
## [1]  3  5 12 16 18 21
## attr(,"match.length")
## [1] 0 0 0 0 0 0
## attr(,"useBytes")
## [1] TRUE

sprintf substitutes the occurrence of %s by the value of x .

You could use paste0 which is short for paste(x, sep="") ...

x <- "GGC"
text <- 'ATGGGCTATGGGCTATGGGCTATGGGCT'
gregexpr(paste0('(?=', x, ')'), text, perl=TRUE)
# [[1]]
# [1]  4 11 18 25
# attr(,"match.length")
# [1] 0 0 0 0
# attr(,"useBytes")
# [1] TRUE

And if you want to access the overlapping matches, take a look at Overlapping matches in R

The fn$ prefix in gsubfn package supports string interpolation:

library(gsubfn)

# test data
text <- "ATGGGCTAAATGGGCT"
x <- "GGGC"

fn$gregexpr("(?=$x)", text, perl = TRUE)

See ?fn , the gsubfn home page and the gsubfn vignette, vignette("gsubfn") .

ok I solved it in this way:

text="ATGGGCTAAATGGGCT"
x="GGC"
c=paste("(?=",x,")",sep="")
r=gregexpr(c,text,perl=TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM