[英]Extract multiple instances of a pattern from a string in R
I have a character vector t
as follows. 我有一个字符向量
t
如下。
t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345",
"GID895 GID895 K350")
I would like to extract all the strings starting with GID and followed by a sequence of digits. 我想提取所有以GID开头的字符串,然后是一系列数字。
This works, but does not retrieve multiple instances. 这可以,但不检索多个实例。
gsub(".*(GID\\d+).*", "\\1", t)
[1] "GID456" "GID667" "GID2345" "GID895"
How to extract all the strings in this case? 在这种情况下如何提取所有字符串? The desired output is as follows
所需的输出如下
out <- c("GID456", "GID456", "GID667", "GID45345", "GID2345",
"GID895", "GID895")
Here's an approach using a package I maintain qdapRegex (I prefer this or stringi/stringr) to base for consistency and ease of use. 这是一种使用包维护qdapRegex(我更喜欢这个或stringi / stringr)的方法,以确保一致性和易用性。 I also show a base approach.
我还展示了一种基本方法。 In any event I'd look at this more as an "extraction" problem than a subbing problem.
无论如何,我认为这更像是一个“提取”问题,而不是一个问题。
y <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345",
"GID895 GID895 K350")
library(qdapRegex)
unlist(ex_default(y, pattern = "GID\\d+"))
## [1] "GID456" "GID456" "GID667" "GID45345" "GID2345" "GID895" "GID895"
In base R: 在基地R:
unlist(regmatches(y, gregexpr("GID\\d+", y)))
Through gsub
通过
gsub
> t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345",
+ "GID895 GID895 K350")
> unlist(strsplit(gsub("(GID\\d+)|.", "\\1 ", t), "\\s+"))
[1] "GID456" "GID456" "GID667" "GID45345" "GID2345"
[6] "GID895" "GID895"
I have used str_split
function from the stringr
package 我使用了
stringr
包中的str_split
函数
library(stringr)
word.list = str_split(t, '\\s+')
new_list <- unlist(word.list)
new_list[grep("GID", new_list)]
I hope this helps. 我希望这有帮助。
I'm late to the party, but this tidyverse one-liner might be useful for someone. 我迟到了,但这个整齐的单行可能对某人有用。
With stringr + dplyr: 使用stringr + dplyr:
t <- c("GID456 SPK711", "GID456 GID667 VINK", "GID45345 DNP990 GID2345", "GID895 GID895 K350")
str_extract_all(t, regex("GID\\d+")) %>% unlist()
gives: 得到:
[1] "GID456" "GID456" "GID667" "GID45345" "GID2345" "GID895" "GID895"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.