简体   繁体   中英

Perl regular expressions explanation

I was hoping to get a little explanation I have the following script:

open (FILE, '2.txt');
@DNA = <FILE>;
$DNA = join ('', @DNA);

print "DNA = ". $DNA . "\n";

$a=0;
while ($DNA =~ //ig) {$a++;}
print "Total characters = ".$a."\n";

$b=0;
while ($DNA =~ /fl/ig) {$b++;}
print "Total fl = ".$b."\n";

$c=0;
while ($DNA =~ /[^fl]/ig) {$c++;}
print "Total character less fl = ".$c."\n";

exit;

The text document "2.txt" contains the following characters:

flkkkklllkkfewnofnewofewfl

When I run the script I get the following outputs:

DNA = flkkkklllkkfewnofnewofewfl
Total characters = 27
Total fl = 2
Total character less fl = 16

My question is, why when I do
while ($DNA =~ /fl/ig) {$b++;} if counts all the instances of fl together,

but when I do
while ($DNA =~ /[^fl]/ig) {$c++;} it counts the number of characters that
are neither an f or and l (ie the f & the l are treated separately).

I was looking for the script to count the number of characters that are not fl (ie treated together)

[fl] is a character class, means f or l .
It doesn't mean the substring fl .

So [^fl] counts all the characters that are not f or l.

However, you could do that with a regex like this -

/[^fl]|f(?!l)|(?<!f)l/

Formatted:

    [^fl]          # Not f nor l
 |  f (?! l )      # f not followed by l
 |  (?<! f ) l     # l not following f

Keeping it simple, maybe consider dropping all the instances of "fl" first, then simply counting the remaining characters:

$DNA =~ s/fl//g;
print "Total characters less fl = ".length($DNA)."\n";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM