简体   繁体   English

Perl 正则表达式查找并返回每个可能的匹配项

[英]Perl Regex Find and Return Every Possible Match

Im trying to create a while loop that will find every possible sub-string within a string.我试图创建一个 while 循环,它将在一个字符串中找到每个可能的子字符串。 But so far all I can match is the largest instance or the shortest.但到目前为止,我只能匹配最大的实例或最短的实例。 So for example I have the string所以例如我有字符串

EDIT CHANGE STRING FOR DEMO PURPOSES出于演示目的编辑更改字符串

"A.....B.....B......B......B......B"

And I want to find every possible sequence of "A.......B"我想找到“A.......B”的每一个可能的序列

This code will give me the shortest possible return and exit the while loop这段代码会给我最短的返回值并退出 while 循环

while($string =~ m/(A(.*?)B)/gi) {
    print "found\n";
    my $substr = $1;
    print $substr."\n";
}

And this will give me the longest and exit the while loop.这将给我最长的时间并退出 while 循环。

$string =~ m/(A(.*)B)/gi

But I want it to loop through the string returning every possible match.但我希望它遍历返回每个可能匹配项的字符串。 Does anyone know if Perl allows for this?有谁知道 Perl 是否允许这样做?

EDIT ADDED DESIRED OUTPUT BELOW在下面编辑添加的所需输出

found
A.....B
found
A.....B.....B
found
A.....B.....B......B
found
A.....B.....B......B......B
found
A.....B.....B......B......B......B

There are various ways to parse the string so to scoop up what you want.有多种方法可以解析字符串,以便获取您想要的内容。

For example, use regex to step through all A...A substrings and process each capture例如,使用正则表达式遍历所有A...A子串并处理每个捕获

use warnings;
use strict;
use feature 'say';

my $s = "A.....B.....B......B......B......B";

while ($s =~ m/(A.*)(?=A|$)/gi) {
    my @seqs = split /(B)/, $1; 
    for my $i (0..$#seqs) {
        say @seqs[0..$i] if $i % 2 != 0;
    }   
}

The (?=A|$) is a lookahead, so .* matches everything up to an A (or the end of string) but that A is not consumed and so is there for the next match. (?=A|$)是一个前瞻,所以.*匹配直到A (或字符串的结尾)的所有内容,但A没有被消耗,所以在下一场比赛中也是如此。 The split uses () in the separator pattern so that the separator, too, is returned (so we have all those B's). split在分隔符模式中使用()以便也返回分隔符(因此我们拥有所有这些 B)。 It only prints for an even number of elements, so only substrings ending with the separator ( B here).它只打印偶数个元素,因此只打印以分隔符(此处为B )结尾的子字符串。

The above prints以上印

A.....B
A.....B.....B
A.....B.....B......B
A.....B.....B......B......B
A.....B.....B......B......B......B

There may be bioinformatics modules that do this but I am not familiar with them.可能有生物信息学模块可以做到这一点,但我不熟悉它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM