简体   繁体   中英

start and end positions of a character; R

I am trying to get

  1. start and end positions of "-" character in column V1
  2. and its corresponding characters at these positions in column V2
  3. Then length of it

Any help will be appreciated!

ip <- structure(list(V1 = c("ab---cdef", "abcd---ef", "a--bc--def"), 
    V2 = c("xxxxxxxyy", "xxxxxyyyy", "xxxyyyzzzz")), class = "data.frame", row.names = c(NA, 
-3L))

I tried stringi_locate but it outputs for individual position. For example, For this "ab---cdef" instead of 3-5 it outputs 3-3, 4-4, 5-5.

Expected output:

op <- structure(list(V1 = c("ab---cdef", "abcd---ef", "a--bc--def"), 
    V2 = c("xxxxxxxyy", "xxxxxyyyy", "xxxyyyzzzz"), output = c("x:x-3:5-3", 
    "x:y-5:7-3", "x:x-2:3-2; y-z:6:7-2")), class = "data.frame", row.names = c(NA, 
-3L))

the output column must have

  1. The characters in V2 column with respect to start and end of "-" in V1
  2. Then start and end position
  3. Then its length
   V1          V2           output
ab---cdef    xxxxxxxyy     x:x-3:5-3

Thanks!

Here's an example using grepexpr to get all the matches in a string.

x <- gregexpr("-+", ip$V1)
mapply(function(m, s, r) {
  start <- m
  len <- attr(m, "match.length")
  end <- start + len-1
  part <- mapply(substr, r, start, end)
  paste0(part, "-", start, ":", end, "-", len, collapse=";")
  
}, x, ip$V1, ip$V2)
# [1] "xxx-3:5-3"         
# [2] "xyy-5:7-3"        
# [3] "xx-2:3-2;yz-6:7-2"

I'm not sure what your logic was for turning xxx into x:x or xyy to xy or how that generalized to other sequences so feel free to change that part. But you can get the start and length of the matches using the attributes of the returned match object. It's just important to use -+ as the pattern so you match a run of dashes rather than just a single dash.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM