R：使用 strsplit() 在两个字符之间拆分字符串

Question

Let's say I have the following string:假设我有以下字符串：

s <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705"

I would like to recover the strings between ";"我想恢复";"之间的字符串and "=" to get the following output:和"="得到以下输出：

[1] "MIMAT0027618"  "MIMAT0027618"  "hsa-miR-6859-5p"  "MI0022705"

Can I use strsplit() with more than one split element?我可以将strsplit()与多个拆分元素一起使用吗？

Answer 1

1) strsplit with matrix Try this: 1) strsplit 与矩阵试试这个：

> matrix(strsplit(s, "[;=]")[[1]], 2)[2,]
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"

2) strsplit with gsub or this use of strsplit with gsub : 2) 使用 gsub进行strsplit或使用带有gsub的strsplit ：

> strsplit(gsub("[^=;]+=", "", s), ";")[[1]]
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"

3) strsplit with sub or this use of strsplit with sub : 3）具有子或此使用的strsplit strsplit与sub ：

> sub(".*=", "", strsplit(s, ";")[[1]])
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"

4) strapplyc or this which extracts consecutive non-semicolons after equal signs: 4）strapplyc或 this 在等号后提取连续的非分号：

> library(gsubfn)
> strapplyc(s, "=([^;]+)", simplify = unlist)
[1] "MIMAT0027618"    "MIMAT0027618"    "hsa-miR-6859-5p" "MI0022705"

ADDED additional strplit solutions.添加了额外的strplit解决方案。

Answer 2

I know this is an old question, but I found the usage of lookaround regular expressions quite elegant for this problem:我知道这是一个老问题，但我发现使用环视正则表达式来解决这个问题非常优雅：

library(stringr)
your_string <- '/this/file/name.txt'
result <- str_extract(string = your_string, pattern = "(?<=/)[^/]*(?=\\.)")
result

In words,用一句话来说，

The (?<=...) part looks before the desired string for a ... (in this case a forward slash). (?<=...)部分在所需的字符串之前查找... （在本例中为正斜杠）。
The [^/]* then looks for as many characters in a row that are not a forward slash (in this case name.txt ). [^/]*然后在一行中查找尽可能多的不是正斜杠的字符（在本例中为name.txt ）。
The (?=...) then looks after the desired string for a ... (in this case the special period character, which needs to be escaped as \\\\. ).在(?=...)然后为所需的字符串后看起来... （在这种情况下的特殊时期字符，它需要被转义为\\\\.

This also works on dataframes:这也适用于数据帧：

library(dplyr)
strings <- c('/this/file/name1.txt', 'tis/other/file/name2.csv')
df <- as.data.frame(strings) %>% 
  mutate(name = str_extract(string = strings, pattern = "(?<=/)[^/]*(?=\\.)"))
# Optional
names <- df %>% pull(name)

Or, in your case:或者，就您而言：

your_string <- "ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-miR-6859-5p;Derives_from=MI0022705" 
result <- str_extract(string = your_string, pattern = "(?<=;Alias=)[^;]*(?=;)") 
result # Outputs 'MIMAT0027618'

R：使用 strsplit() 在两个字符之间拆分字符串

问题描述

2 个解决方案

解决方案1
17 已采纳 2014-02-09 14:08:53

解决方案2
1 2020-03-30 16:21:11

R：使用 strsplit() 在两个字符之间拆分字符串

问题描述

2 个解决方案

解决方案1 17 已采纳 2014-02-09 14:08:53

解决方案2 1 2020-03-30 16:21:11

解决方案1
17 已采纳 2014-02-09 14:08:53

解决方案2
1 2020-03-30 16:21:11