I was hoping to get a little explanation I have the following script:
open (FILE, '2.txt');
@DNA = <FILE>;
$DNA = join ('', @DNA);
print "DNA = ". $DNA . "\n";
$a=0;
while ($DNA =~ //ig) {$a++;}
print "Total characters = ".$a."\n";
$b=0;
while ($DNA =~ /fl/ig) {$b++;}
print "Total fl = ".$b."\n";
$c=0;
while ($DNA =~ /[^fl]/ig) {$c++;}
print "Total character less fl = ".$c."\n";
exit;
The text document "2.txt" contains the following characters:
flkkkklllkkfewnofnewofewfl
When I run the script I get the following outputs:
DNA = flkkkklllkkfewnofnewofewfl
Total characters = 27
Total fl = 2
Total character less fl = 16
My question is, why when I do
while ($DNA =~ /fl/ig) {$b++;}
if counts all the instances of fl together,
but when I do
while ($DNA =~ /[^fl]/ig) {$c++;}
it counts the number of characters that
are neither an f or and l (ie the f & the l are treated separately).
I was looking for the script to count the number of characters that are not fl (ie treated together)
[fl]
is a character class, means f or l .
It doesn't mean the substring fl
.
So [^fl]
counts all the characters that are not f or l.
However, you could do that with a regex like this -
/[^fl]|f(?!l)|(?<!f)l/
Formatted:
[^fl] # Not f nor l
| f (?! l ) # f not followed by l
| (?<! f ) l # l not following f
Keeping it simple, maybe consider dropping all the instances of "fl" first, then simply counting the remaining characters:
$DNA =~ s/fl//g;
print "Total characters less fl = ".length($DNA)."\n";
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.