简体   繁体   中英

Strsplit multiple delimiters r

i want to split this genomic coordinate : chr1:713625-714625

to have only the start coordinate : 713625

I tried this command :

data.table(unlist(lapply(data$gene,function(x)unlist(strsplit(x, [:]))[2])))$V1

but it gives me this : 713625-714625

Do you have any suggestion.

You are almost there when using strsplit , but should use [:-] or :|-

> unlist(strsplit("chr1:713625-714625", "[:-]"))[2]
[1] "713625"

> unlist(strsplit("chr1:713625-714625", ":|-"))[2]
[1] "713625"

The following code extracts everything between the : and - in the string:

string <- c("chr1:713625-714625")
gsub(".*[:]([^.]+)[-].*", "\\1", string)

Output:

[1] "713625"

I tried these 2 commands and both of them gives me the same result :

gsub(".*[:]([^.]+)[-].*", "\\1", string) by Quinten

data.table(unlist(lapply(data$gene,function(x)unlist(strsplit(x, "[:-]"))[2])))$V1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM