[英]R: extract list of matching parts of a string via regex
Let's say that I need to extract different parts from a string as list, for example I would like to divide the string "aaa12xxx"
in three parts. 假设我需要从字符串中提取不同的部分作为列表,例如我想将字符串
"aaa12xxx"
分成三部分。
One possibility is to do three gsub
calls: 一种可能性是做三个
gsub
调用:
parts = c()
parts[1] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\1', "aaa12xxx")
parts[2] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\2', "aaa12xxx")
parts[3] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\3', "aaa12xxx")
Of course this seems quite a waste (even if it's inside a for
loop). 当然这看起来很浪费(即使它在
for
循环中)。 Isn't there a function that simply returns the list of parts from a regex and a test string? 是不是只有从正则表达式和测试字符串返回部件列表的函数?
Just split the input string through strsplit
and get the parts you want.. 只需通过
strsplit
分割输入字符串并获取所需的部分..
> x <- "aaa12xxx"
> strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE)
[[1]]
[1] "aaa" "12" "xxx"
Get the parts by specifying the index number.. 通过指定索引号来获取零件。
> m <- unlist(strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE))
> m[1]
[1] "aaa"
> m[2]
[1] "12"
> m[3]
[1] "xxx"
(?<=[[:alpha:]])(?=\\\\d)
Matches all the boundaries which are preceded by an alphabet and followed by a digit. (?<=[[:alpha:]])(?=\\\\d)
匹配前面有字母表并后跟数字的所有边界。
|
OR 要么
(?<=\\\\d)(?=[[:alpha:]])
Matches all the boundaries which are preceded by a digit and followed by an alphabet. (?<=\\\\d)(?=[[:alpha:]])
匹配前面有数字并后跟字母表的所有边界。
Splitting your input according to the matched boundaries will give you the desired output. 根据匹配的边界拆分输入将为您提供所需的输出。
(\\d+)|([a-zA-Z]+)
or 要么
([[:alpha:]]+)|([0-9]+)
You can just grab the capture.use str_match_all()
from library(stringr)
.See demo. 你可以从
library(stringr)
获取capture.use str_match_all()
。 library(stringr)
demo。
https://regex101.com/r/fA6wE2/8 https://regex101.com/r/fA6wE2/8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.