[英]R: splitting a string between two characters using strsplit()
Let's say I have the following string:假设我有以下字符串:
s <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"
I would like to recover the strings between ";"
我想恢复
";"
之间的字符串and "="
to get the following output:和
"="
得到以下输出:
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
Can I use strsplit()
with more than one split element?我可以将
strsplit()
与多个拆分元素一起使用吗?
1) strsplit with matrix Try this: 1) strsplit 与矩阵试试这个:
> matrix(strsplit(s, "[;=]")[[1]], 2)[2,]
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
2) strsplit with gsub or this use of strsplit
with gsub
: 2) 使用 gsub进行
strsplit
或使用带有gsub
的strsplit
:
> strsplit(gsub("[^=;]+=", "", s), ";")[[1]]
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
3) strsplit with sub or this use of strsplit
with sub
: 3)具有子或此使用的strsplit
strsplit
与sub
:
> sub(".*=", "", strsplit(s, ";")[[1]])
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
4) strapplyc or this which extracts consecutive non-semicolons after equal signs: 4)strapplyc或 this 在等号后提取连续的非分号:
> library(gsubfn)
> strapplyc(s, "=([^;]+)", simplify = unlist)
[1] "MIMAT0027618" "MIMAT0027618" "hsa-miR-6859-5p" "MI0022705"
ADDED additional strplit
solutions.添加了额外的
strplit
解决方案。
I know this is an old question, but I found the usage of lookaround regular expressions quite elegant for this problem:我知道这是一个老问题,但我发现使用环视正则表达式来解决这个问题非常优雅:
library(stringr)
your_string <- '/this/file/name.txt'
result <- str_extract(string = your_string, pattern = "(?<=/)[^/]*(?=\\.)")
result
In words,用一句话来说,
(?<=...)
part looks before the desired string for a ...
(in this case a forward slash). (?<=...)
部分在所需的字符串之前查找...
(在本例中为正斜杠)。[^/]*
then looks for as many characters in a row that are not a forward slash (in this case name.txt
). [^/]*
然后在一行中查找尽可能多的不是正斜杠的字符(在本例中为name.txt
)。(?=...)
then looks after the desired string for a ...
(in this case the special period character, which needs to be escaped as \\\\.
).(?=...)
然后为所需的字符串后看起来...
(在这种情况下的特殊时期字符,它需要被转义为\\\\.
This also works on dataframes:这也适用于数据帧:
library(dplyr)
strings <- c('/this/file/name1.txt', 'tis/other/file/name2.csv')
df <- as.data.frame(strings) %>%
mutate(name = str_extract(string = strings, pattern = "(?<=/)[^/]*(?=\\.)"))
# Optional
names <- df %>% pull(name)
Or, in your case:或者,就您而言:
your_string <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"
result <- str_extract(string = your_string, pattern = "(?<=;Alias=)[^;]*(?=;)")
result # Outputs 'MIMAT0027618'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.