简体   繁体   中英

extract substring between "-" and "-" in string in R

i have a list of string that looks like this:

list=["chr21-10139833-AC","chry-10139832-bf"]

for every string in the list i need to extract the numbers between "-" and "-"

so i would get:

[10139833,10139832]

i tried this:

gsub(".*[-]([^-]+)[-]", "\\1", list

but it returns:

[ac,bf]

what can i do to make it work? thank you

Using str_extract from stringr we can try:

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- str_extract(list, "(?<=-)(\\d+)(?=-)")
nums

[1] "10139833" "10139832"

We could also use sub for a base R option:

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- sub(".*-(\\d+).*", "\\1", list)
nums

[1] "10139833" "10139832"

You can use str_split_i to get the i th split string:

library(stringr)
str <- c("chr21-10139833-A-C", "chry-10139832-b-f")

str_split_i(str, "-", i = 2)
#[1] "10139833" "10139832"

1) Using the input shown in the Note at the end, use read.table . If you want character output instead add colClasses = "character" argument to read.table .

read.table(text = x, sep = "-")[[2]]
## [1] 10139833 10139832

2) Another possibility is to use strapply . If you want character output then omit the as.numeric argument.

library(gsubfn)
strapply(x, "-(\\d+)-", as.numeric, simplify = TRUE)
## [1] 10139833 10139832

Note

x <- c("chr21-10139833-A-C", "chry-10139832-b-f")

If your structure and character of your string are always like that with word characters and hyphens, you could match 1+ digits between word boundaries:

library(stringr)
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
str_extract(list, "\\b\\d+\\b")

Or with a perl like pattern and \K you might also use

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
regmatches(list, regexpr("-\\K\\d+(?=-)", list, perl = TRUE))

Both will output:

[1] "10139833" "10139832"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM