简体繁体 English

R：通过正则表达式提取字符串匹配部分的列表

[英]R: extract list of matching parts of a string via regex

原文 2015-01-13 12:25:00 3 2 regex/ r/ string/ substring/ string-matching

Let's say that I need to extract different parts from a string as list, for example I would like to divide the string "aaa12xxx" in three parts. 假设我需要从字符串中提取不同的部分作为列表，例如我想将字符串"aaa12xxx"分成三部分。

One possibility is to do three gsub calls: 一种可能性是做三个gsub调用：

parts = c()
parts[1] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\1', "aaa12xxx")
parts[2] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\2', "aaa12xxx")
parts[3] = gsub('([[:alpha:]]+)([0-9]+)([[:alpha:]]+)', '\\3', "aaa12xxx")

Of course this seems quite a waste (even if it's inside a for loop). 当然这看起来很浪费（即使它在for循环中）。 Isn't there a function that simply returns the list of parts from a regex and a test string? 是不是只有从正则表达式和测试字符串返回部件列表的函数？

2 个解决方案

Just split the input string through strsplit and get the parts you want.. 只需通过strsplit分割输入字符串并获取所需的部分..

> x <- "aaa12xxx"
> strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE)
[[1]]
[1] "aaa" "12"  "xxx"

Get the parts by specifying the index number.. 通过指定索引号来获取零件。

> m <- unlist(strsplit(x,"(?<=[[:alpha:]])(?=\\d)|(?<=\\d)(?=[[:alpha:]])", perl=TRUE))
> m[1]
[1] "aaa"
> m[2]
[1] "12"
> m[3]
[1] "xxx"

(?<=[[:alpha:]])(?=\\\\d) Matches all the boundaries which are preceded by an alphabet and followed by a digit. (?<=[[:alpha:]])(?=\\\\d)匹配前面有字母表并后跟数字的所有边界。
| OR 要么
(?<=\\\\d)(?=[[:alpha:]]) Matches all the boundaries which are preceded by a digit and followed by an alphabet. (?<=\\\\d)(?=[[:alpha:]])匹配前面有数字并后跟字母表的所有边界。
Splitting your input according to the matched boundaries will give you the desired output. 根据匹配的边界拆分输入将为您提供所需的输出。

(\\d+)|([a-zA-Z]+)

or 要么

([[:alpha:]]+)|([0-9]+)

You can just grab the capture.use str_match_all() from library(stringr) .See demo. 你可以从library(stringr)获取capture.use str_match_all() 。 library(stringr) demo。

https://regex101.com/r/fA6wE2/8 https://regex101.com/r/fA6wE2/8

正则表达式提取字符串的一部分 - Regex to extract parts of string

通过 Scala 中的正则表达式模式匹配提取字符串的重复部分 - Extract the repetitive parts of a String by Regex pattern matching in Scala

在R中提取字符串的一部分 - Extract parts of a string in R

提取匹配正则表达式的字符串 - Extract string matching regex

javascript：提取字符串的一部分（正则表达式） - javascript: extract parts of a string (regex)

正则表达式：获取字符串的匹配和“不匹配”部分 - Regex: Get matching and "not matching" parts of string

正则表达式提取部分字符串匹配R中的某些单词 - regex to extract partial string matching certain words in R

如何在bash中提取与特定正则表达式匹配的字符串的特定部分？ - How can I extract specific parts of a string matching a specific regex in bash?

C语言正则表达式匹配字符串的多个部分 - c language regex matching mutiple parts of a string

使用正则表达式分组结构匹配字符串的各个部分 - Matching parts of string using regex grouping constructs

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式提取字符串的一部分 - Regex to extract parts of string 通过 Scala 中的正则表达式模式匹配提取字符串的重复部分 - Extract the repetitive parts of a String by Regex pattern matching in Scala 在R中提取字符串的一部分 - Extract parts of a string in R 提取匹配正则表达式的字符串 - Extract string matching regex javascript：提取字符串的一部分（正则表达式） - javascript: extract parts of a string (regex) 正则表达式：获取字符串的匹配和“不匹配”部分 - Regex: Get matching and "not matching" parts of string 正则表达式提取部分字符串匹配R中的某些单词 - regex to extract partial string matching certain words in R 如何在bash中提取与特定正则表达式匹配的字符串的特定部分？ - How can I extract specific parts of a string matching a specific regex in bash? C语言正则表达式匹配字符串的多个部分 - c language regex matching mutiple parts of a string 使用正则表达式分组结构匹配字符串的各个部分 - Matching parts of string using regex grouping constructs

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM