简体   繁体   English

在R中的字符串中提取“-”和“-”之间的子字符串

[英]extract substring between "-" and "-" in string in R

i have a list of string that looks like this:我有一个看起来像这样的字符串列表:

list=["chr21-10139833-AC","chry-10139832-bf"]

for every string in the list i need to extract the numbers between "-" and "-"对于列表中的每个字符串,我需要提取“-”和“-”之间的数字

so i would get:所以我会得到:

[10139833,10139832]

i tried this:我试过这个:

gsub(".*[-]([^-]+)[-]", "\\1", list

but it returns:但它返回:

[ac,bf]

what can i do to make it work?我该怎么做才能让它发挥作用? thank you谢谢你

Using str_extract from stringr we can try:使用str_extract中的stringr我们可以尝试:

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- str_extract(list, "(?<=-)(\\d+)(?=-)")
nums

[1] "10139833" "10139832"

We could also use sub for a base R option:我们还可以将sub用于基础 R 选项:

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
nums <- sub(".*-(\\d+).*", "\\1", list)
nums

[1] "10139833" "10139832"

You can use str_split_i to get the i th split string:您可以使用str_split_i获取第i个拆分字符串:

library(stringr)
str <- c("chr21-10139833-A-C", "chry-10139832-b-f")

str_split_i(str, "-", i = 2)
#[1] "10139833" "10139832"

1) Using the input shown in the Note at the end, use read.table . 1)使用末尾注释中显示的输入,使用read.table If you want character output instead add colClasses = "character" argument to read.table .如果您想要字符输出,请将colClasses = "character"参数添加到read.table

read.table(text = x, sep = "-")[[2]]
## [1] 10139833 10139832

2) Another possibility is to use strapply . 2)另一种可能性是使用strapply If you want character output then omit the as.numeric argument.如果你想要字符输出,那么省略as.numeric参数。

library(gsubfn)
strapply(x, "-(\\d+)-", as.numeric, simplify = TRUE)
## [1] 10139833 10139832

Note笔记

x <- c("chr21-10139833-A-C", "chry-10139832-b-f")

If your structure and character of your string are always like that with word characters and hyphens, you could match 1+ digits between word boundaries:如果您的字符串的结构和字符总是像单词字符和连字符那样,您可以在单词边界之间匹配 1+ 个数字:

library(stringr)
list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
str_extract(list, "\\b\\d+\\b")

Or with a perl like pattern and \K you might also use或者使用类似 perl 的模式和\K你也可以使用

list <- c("chr21-10139833-A-C", "chry-10139832-b-f")
regmatches(list, regexpr("-\\K\\d+(?=-)", list, perl = TRUE))

Both will output:两者都会输出:

[1] "10139833" "10139832"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM