I am trying to get
Any help will be appreciated!
ip <- structure(list(V1 = c("ab---cdef", "abcd---ef", "a--bc--def"),
V2 = c("xxxxxxxyy", "xxxxxyyyy", "xxxyyyzzzz")), class = "data.frame", row.names = c(NA,
-3L))
I tried stringi_locate but it outputs for individual position. For example, For this "ab---cdef" instead of 3-5 it outputs 3-3, 4-4, 5-5.
Expected output:
op <- structure(list(V1 = c("ab---cdef", "abcd---ef", "a--bc--def"),
V2 = c("xxxxxxxyy", "xxxxxyyyy", "xxxyyyzzzz"), output = c("x:x-3:5-3",
"x:y-5:7-3", "x:x-2:3-2; y-z:6:7-2")), class = "data.frame", row.names = c(NA,
-3L))
the output column must have
V1 V2 output
ab---cdef xxxxxxxyy x:x-3:5-3
Thanks!
Here's an example using grepexpr
to get all the matches in a string.
x <- gregexpr("-+", ip$V1)
mapply(function(m, s, r) {
start <- m
len <- attr(m, "match.length")
end <- start + len-1
part <- mapply(substr, r, start, end)
paste0(part, "-", start, ":", end, "-", len, collapse=";")
}, x, ip$V1, ip$V2)
# [1] "xxx-3:5-3"
# [2] "xyy-5:7-3"
# [3] "xx-2:3-2;yz-6:7-2"
I'm not sure what your logic was for turning xxx
into x:x
or xyy
to xy
or how that generalized to other sequences so feel free to change that part. But you can get the start and length of the matches using the attributes of the returned match object. It's just important to use -+
as the pattern so you match a run of dashes rather than just a single dash.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.