从R中的字符串中提取模式的多个实例

Question

I have a character vector t as follows. 我有一个字符向量t如下。

t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", 
    "GID895 GID895 K350")

I would like to extract all the strings starting with GID and followed by a sequence of digits. 我想提取所有以GID开头的字符串，然后是一系列数字。

This works, but does not retrieve multiple instances. 这可以，但不检索多个实例。

gsub(".*(GID\\d+).*", "\\1", t)
[1] "GID456"  "GID667"  "GID2345" "GID895"

How to extract all the strings in this case? 在这种情况下如何提取所有字符串？ The desired output is as follows 所需的输出如下

out <- c("GID456", "GID456", "GID667", "GID45345", "GID2345", 
        "GID895", "GID895")

Answer 1

Here's an approach using a package I maintain qdapRegex (I prefer this or stringi/stringr) to base for consistency and ease of use. 这是一种使用包维护qdapRegex（我更喜欢这个或stringi / stringr）的方法，以确保一致性和易用性。 I also show a base approach. 我还展示了一种基本方法。 In any event I'd look at this more as an "extraction" problem than a subbing problem. 无论如何，我认为这更像是一个“提取”问题，而不是一个问题。

y <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", 
    "GID895 GID895 K350")

library(qdapRegex)
unlist(ex_default(y, pattern = "GID\\d+"))

## [1] "GID456"   "GID456"   "GID667"   "GID45345" "GID2345"  "GID895"   "GID895"

In base R: 在基地R：

unlist(regmatches(y, gregexpr("GID\\d+", y)))

Answer 2

Through gsub 通过gsub

> t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", 
+        "GID895 GID895 K350")
> unlist(strsplit(gsub("(GID\\d+)|.", "\\1 ", t), "\\s+"))
[1] "GID456"   "GID456"   "GID667"   "GID45345" "GID2345" 
[6] "GID895"   "GID895"

Answer 3

I have used str_split function from the stringr package 我使用了stringr包中的str_split函数

library(stringr)
word.list = str_split(t, '\\s+') 
new_list <- unlist(word.list)
new_list[grep("GID", new_list)]

I hope this helps. 我希望这有帮助。

Answer 4

I'm late to the party, but this tidyverse one-liner might be useful for someone. 我迟到了，但这个整齐的单行可能对某人有用。

With stringr + dplyr: 使用stringr + dplyr：

t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", "GID895 GID895 K350")
str_extract_all(t, regex("GID\\d+")) %>% unlist()

gives: 得到：

[1] "GID456" "GID456" "GID667" "GID45345" "GID2345" "GID895" "GID895"

从R中的字符串中提取模式的多个实例

问题描述

4 个解决方案

解决方案1
11 已采纳 2015-05-12 05:18:26

解决方案2
3 2015-05-12 06:17:11

解决方案3
1 2015-05-12 06:15:00

解决方案4
1 2018-06-06 02:52:20

从R中的字符串中提取模式的多个实例

问题描述

4 个解决方案

解决方案1 11 已采纳 2015-05-12 05:18:26

解决方案2 3 2015-05-12 06:17:11

解决方案3 1 2015-05-12 06:15:00

解决方案4 1 2018-06-06 02:52:20

解决方案1
11 已采纳 2015-05-12 05:18:26

解决方案2
3 2015-05-12 06:17:11

解决方案3
1 2015-05-12 06:15:00

解决方案4
1 2018-06-06 02:52:20