简体   繁体   中英

perl6 Regex match conjunction &&

Perl6 regex match conjunction && returns True if all parts in the conjunction matches the same substring instead of the whole string:

> my $a="123abc456def";
123abc456def
> so $a ~~ m/ 23 && ef /
False

It is False because "23" in the conjunction matched "23" substring in $a, but this substring does not match "ef" in the conjunction. This is a little counter-intuitive because it is easier to interpret $a~~m/23&&ef/ as "$a matches 23 and $a matches ef" than as "$a has a substring that matches 23 and this substring also matches ef".

If I have n regexes and I want to see if all these n regexes match the same whole string rather than match the same substring part of the whole string, then what is the best way to write the perl6 expression?

In the example, I really mean to do

so (($a ~~ /23/) && ($a ~~ /ef/))

If the number of regexes is large, then the above is harder to write except with a loop:

so (gather {for @myRegexes { take $a ~~ / $_ /; } }).all

Is there a simpler way?

With alternations, it is much easier to read as "$a matches 23 or $a matches ef" rather than "the part of $a that matches 23 or matches ef":

> so $a ~~ m/ 23 || ef /
True

Thanks !

lisprog

You can use a Junction of the two regexes in order to only mention $a once.

my $a = 'abcdef12345'; say so $a ~~ /23/ & /ef/   # True
my $a = 'abcde12345'; say so $a ~~ /23/ & /ef/    # False 
my $a = 'abcdef1245'; say so $a ~~ /23/ & /ef/    # False

To form the junction from an array of regexes, call .all on the array.

If it's really just literal strings to find, then contains is likely to run quite a bit faster:

my $a = 'abcdef12345'; say so $a.contains(all('23', 'ef'))   # True
my $a = 'abcde12345'; say so $a.contains(all('23', 'ef'))    # False
my $a = 'abcdef1245'; say so $a.contains(all('23', 'ef'))    # False

A solution focusing on simplicity, not speed

Ignoring regexes for a moment, the generic P6 construct for making foo op bar and foo op baz shorter, provided op is pure in the sense that it's OK to run multiple calls to it in parallel, is foo op bar & baz .

(The main language's & operator is a Junction operator. Junctions are conjunctions with two key characteristics; one is their syntactic brevity/simplicity/clarity; the other is their parallel processing semantics.)

Applying this to the ~~ op in your regex match:

my $a="123abc456def";
say so $a ~~ / 23 / & / ef /

The above is often suitable provided the bar & baz & ... fits nicely in a single line.

An alternative that still uses junctional logic but skips the infix operator between operands and scales better to larger lists of patterns to match is something like:

my @keywords = <12 de>;
say so all ( $a.match: / $_ / for @keywords ) ;

(with thanks to @lisprogtor for spotting and patiently explaining the bug in my original code for this bit.)

Solutions focusing on speed, not simplicity

There will be many ways to optimize for speed. I'll provide just one.

If all or most of your patterns are just strings rather than regexes, then use the .contains method rather than regexes for the strings:

say so all ( $a.contains: $_ for <23 ef> ) ;

Intuitiveness

it is easier to interpret $a~~m/23&&ef/ as "$a matches 23 and $a matches ef"

Yes and no.

Yes, in the sense that there's ambiguity to "matches a and b"; and that your guess is one of several reasonable ones for anyone exploring regexes in general; and, in particular, that your guess is evidently the one you currently find most appropriate aka "easiest".

No, if our iofo's were to match.

(I just invented "iofo". I'm using it to mean "in our friendly opinion", a version of ioho that is not only genuinely intended humbly but also with open arms, conjuring an opinion that I/we imagine might one day be happily shared by some readers.)

Iofo we find it easier to read $a~~m/23&&ef/ as "$a matches 23 and ef" rather than "$a matches 23 and $a matches ef". But of course, "$a matches 23 and ef" remains ambiguous.

For the reading you suggest we have junctions, as explained above:

say so $a ~~ / 23 / & / ef /

Just as with && inside a single match, iofo it's appropriate to read the above in English as "$a matches 23 and ef", but this time it's short for "$a matches 23 and $a matches ef", just as you wanted.

In the meantime, use of && inside a single match corresponds to the other useful conjunctional meaning, which is to say it refers to matching the regex atom on its left and the regex atom on its right to the same sub-string.

Iofo this is a highly intuitive approach once one becomes aware of, and then used to, these two possible interpretations of a conjunction.

If the $a string is long, you could try to reduce the run time by avoiding having to restart from the beginning of the string for each substring:

my $a="123abc456def23";
my %pats = <23 ef>.map({ $_ => 1 });
my $join = %pats.keys.join('|');
my $rx = rx{ <{$join}> };
for $a ~~ m:g/ $rx / -> $match {
    %pats{$match.Str}:delete;
    if %pats.keys.elems == 0 {
        say "Match!";
        last;
    }
}

Of course, this does not make the code shorter (in the meaning more elegant) but it could reduce the run time.

If I have n regexes and I want to see if all these n regexes match the same whole string rather than match the same substring part of the whole string, then what is the best way to write the perl6 expression?

Here is an in-regex solution:

/ ^ [ $re1 && $re2 && $re3 ] $ /

Or if you want to be fancy:

/ [^ .* $ ] && $re1 && $re2 /

If you actually meant

how do I check if all my regexes match a string, even if not the same substring

you can express that as

/ .* $re1 .* && .* $re2 .* && .* $re2 .* /

to avoid excessive backtracking, you should anchor the whole regex:

/ ^ [ .* $re1 .* && .* $re2 .* && .* $re2 .* ] $ /

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM