[英]extract substring between "-" and "-" in string in R
i have a list of string that looks like this:我有一个看起来像这样的字符串列表:
list=["chr21-10139833-AC","chry-10139832-bf"]
for every string in the list i need to extract the numbers between "-" and "-"对于列表中的每个字符串,我需要提取“-”和“-”之间的数字
so i would get:所以我会得到:
[10139833,10139832]
i tried this:我试过这个:
gsub(".*[-]([^-]+)[-]", "\\1", list
but it returns:但它返回:
[ac,bf]
what can i do to make it work?我该怎么做才能让它发挥作用? thank you谢谢你
Using str_extract
from stringr
we can try:使用str_extract
中的stringr
我们可以尝试:
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- str_extract(list, "(?<=-)(\\d+)(?=-)")
nums
[1] "10139833" "10139832"
We could also use sub
for a base R option:我们还可以将sub
用于基础 R 选项:
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- sub(".*-(\\d+).*", "\\1", list)
nums
[1] "10139833" "10139832"
You can use str_split_i
to get the i
th split string:您可以使用str_split_i
获取第i
个拆分字符串:
library(stringr)
str <- c("chr21-10139833-A-C", "chry-10139832-b-f")
str_split_i(str, "-", i = 2)
#[1] "10139833" "10139832"
1) Using the input shown in the Note at the end, use read.table
. 1)使用末尾注释中显示的输入,使用read.table
。 If you want character output instead add colClasses = "character"
argument to read.table
.如果您想要字符输出,请将colClasses = "character"
参数添加到read.table
。
read.table(text = x, sep = "-")[[2]]
## [1] 10139833 10139832
2) Another possibility is to use strapply
. 2)另一种可能性是使用strapply
。 If you want character output then omit the as.numeric
argument.如果你想要字符输出,那么省略as.numeric
参数。
library(gsubfn)
strapply(x, "-(\\d+)-", as.numeric, simplify = TRUE)
## [1] 10139833 10139832
x <- c("chr21-10139833-A-C", "chry-10139832-b-f")
If your structure and character of your string are always like that with word characters and hyphens, you could match 1+ digits between word boundaries:如果您的字符串的结构和字符总是像单词字符和连字符那样,您可以在单词边界之间匹配 1+ 个数字:
library(stringr)
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
str_extract(list, "\\b\\d+\\b")
Or with a perl like pattern and \K
you might also use或者使用类似 perl 的模式和\K
你也可以使用
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
regmatches(list, regexpr("-\\K\\d+(?=-)", list, perl = TRUE))
Both will output:两者都会输出:
[1] "10139833" "10139832"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.