简体   繁体   English

多个重叠的正则表达式匹配而不是一个

[英]Multiple overlapping regex matches instead of one

Consider this string: 考虑这个字符串:

data <- "1-FA-1-I2-1-I2-1-I2-1-EX-1-I2-1-I3-1-FA-1-I1-1-I2-1-TR-1-I1-1-I2-1-FA-1-I3-1-I1-1-FA-1-FA-1-NR-1-I3-1-I2-1-TR-1-I1-1-I2-1-I1-1-I2-1-FA-1-I2-1-I1-1-I3-1-FA-1-QU-1-I1-1-I2-1-I2-1-I2-1-NR-1-I2-1-I2-1-NR-1-I1-1-I2-1-I1-1-NR-1-I3-1-QU-1-I2-1-I3-1-QU-1-NR-1-I2-1-I1-1-NR-1-QU-1-QU-1-I2-1-I1-1-EX"

and this regex: 这个正则表达式:

"(I3).{1,}(I3)"

This would match the section between the first I3 and the last I3 . 这将匹配第一个I3和最后一个I3之间的部分。 However, how should I modify the regex to match each separate section beginning and ending with I3 ? 但是,我应该如何修改正则表达式以匹配从I3开始和结束的每个单独部分? Eg 例如

I3-1-FA-1-I1-1-I2-1-TR-1-I1-1-I2-1-FA-1-I3
I3-1-I1-1-FA-1-FA-1-NR-1-I3
I3-1-I2-1-TR-1-I1-1-I2-1-I1-1-I2-1-FA-1-I2-1-I1-1-I3
I3-1-FA-1-QU-1-I1-1-I2-1-I2-1-I2-1-NR-1-I2-1-I2-1-NR-1-I1-1-I2-1-I1-1-NR-1-I3
I3-1-QU-1-I2-1-I3

Use non-greedy form and a positive lookahead. 使用非贪婪的形式和积极的前瞻。

"(?=(I3.+?I3))"

Fetch the string you want from group index 1. Lookaheads helps to do overlapping matches. 从组索引1中获取所需的字符串。Lookaheads有助于进行重叠匹配。 In R you must enable perl=TRUE parameter. R您必须启用perl=TRUE参数。

DEMO DEMO

You can use a strsplit with gsub like this: 您可以像这样使用带有gsubstrsplit

data <- "1-FA-1-I2-1-I2-1-I2-1-EX-1-I2-1-I3-1-FA-1-I1-1-I2-1-TR-1-I1-1-I2-1-FA-1-I3-1-I1-1-FA-1-FA-1-NR-1-I3-1-I2-1-TR-1-I1-1-I2-1-I1-1-I2-1-FA-1-I2-1-I1-1-I3-1-FA-1-QU-1-I1-1-I2-1-I2-1-I2-1-NR-1-I2-1-I2-1-NR-1-I1-1-I2-1-I1-1-NR-1-I3-1-QU-1-I2-1-I3-1-QU-1-NR-1-I2-1-I1-1-NR-1-QU-1-QU-1-I2-1-I1-1-EX"
data <- gsub(".*?(I3.*?)(?=I3)","\\1I3§",data,perl=T)
strsplit(gsub("[^§]*$", "", data),"§")

The .*?(I3.*?)(?=I3) regex (with \\\\1I3§ replacement) will remove all text before I3...I3 , add a fake symbol § (you may use any you do not use), add a backup I3 for us to have complete I3 enclosed entries in the output, and then a second gsub will remove the trailing unnecessary part from the string. .*?(I3.*?)(?=I3)正则表达式(用\\\\1I3§替换)将删除I3...I3之前的所有文本I3...I3 ,添加假符号§ (您可以使用任何不使用的) ,为我们添加一个备份I3 ,在输出中包含完整的I3封闭条目,然后第二个gsub将从字符串中删除尾随不必要的部分。 strsplit will do the final part - fetch you your expected results. strsplit将做最后的部分 - 获取您的预期结果。

See IDEONE demo 请参阅IDEONE演示

Output: 输出:

[1] "I3-1-FA-1-I1-1-I2-1-TR-1-I1-1-I2-1-FA-1-I3"                                   
[2] "I3-1-I1-1-FA-1-FA-1-NR-1-I3"                                                  
[3] "I3-1-I2-1-TR-1-I1-1-I2-1-I1-1-I2-1-FA-1-I2-1-I1-1-I3"                         
[4] "I3-1-FA-1-QU-1-I1-1-I2-1-I2-1-I2-1-NR-1-I2-1-I2-1-NR-1-I1-1-I2-1-I1-1-NR-1-I3"
[5] "I3-1-QU-1-I2-1-I3"   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM