I'm working on a Perl assignment. One of the requirements is to match all integer and float numbers except those in comments or strings (double or single quoted).
Here is my assumption:
And here is the regex I found.
([-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+))
Here is my block of code, I had trouble to exclude numbers in comments and strings so I remove all comments and strings first. I also split lines into words, I believe this should be easier. But I also believe this should not be necessary.
while (<$IN_FILE>) {
s/^(#[^!]+$)//; # remove whole line comments
s{(^[^#]+?)(#[^/]+$)}{$1}; # remove inline comments
s/('.*?'|".*?")//g; # remove all single line strings
push @words, split; # split line into words
}
foreach my $item (<@words>) {
push @numbers, $1 if $item =~ /([-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+))/;
}
It worked ok but failed to match array index like the 0
in ARGV[0]
.
So I need some help to improve my code, it would be nice if I don't have to remove comments, strings first, don't need to split lines into words, and of course match all the numbers not in comments and strings.
Simple input
# Comment 1
my $time = <STDIN>;
chomp $time;
#now write input to STDOUT
print $time . "\n";
my $pi = 3.1415926;
my $test = -3.22;
my $t = +0.01;
my $range = (8..11);
if $ARGV[0];
sub sample2 {
print "true or false";
return 3 + 4 eq "7"; # true or false
}
Here is the output from my code it missed 0
in ARGV[0]
and 11
in (8..11)
. I won't be surprised if it misses more.
[Numbers]
3.1415926
-3.22
+0.01
8
2
3
4
The main problem is here:
foreach my $item (<@words>) {
You want to iterate over @words, so no <>
are needed. They turn into glob
which changes the list you want to iterate over. Just insert
warn "\t$item\n"
into the last loop to see what's being processed.
Even after fixing this, (8..11)
will be tokenized into one "word". You match without any /g
, so you cannot get more than one number from an item.
As choroba already pointed out, your use of <@words>
is an obvious bug.
However, you should simplify things by not breaking your lines into words in the first place and instead use /g
to match
use strict;
use warnings;
my @numbers;
while (<DATA>) {
s/^(#[^!]+$)//; # remove whole line comments
s{(^[^#]+?)(#[^/]+$)}{$1}; # remove inline comments
s/('.*?'|".*?")//g; # remove all single line strings
while (/([-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+))/g) {
push @numbers, $1;
}
}
print "@numbers";
__DATA__
# Comment 1
my $time = <STDIN>;
chomp $time;
#now write input to STDOUT
print $time . "\n";
my $pi = 3.1415926;
my $test = -3.22;
my $t = +0.01;
my $range = (8..11);
if $ARGV[0];
sub sample2 {
print "true or false";
return 3 + 4 eq "7"; # true or false
}
This will end up pulling too many results. One solution is to add a word boundary before the numbers in the regex:
while (/([-+]?\b([0-9]+(\.[0-9]+)?|\.[0-9]+))\b/g) {
Outputs:
3.1415926 -3.22 +0.01 8 11 0 3 4
The best way to accomplish this is by using PPI
though. That is definitely outside of the scope of what your professor is trying to teach you, but to demonstrate:
use strict;
use warnings;
use PPI;
my $src = do {local $/; <DATA>};
# Load a document
my $doc = PPI::Document->new( \$src );
# Find all the barewords within the doc
my $nums = $doc->find( 'PPI::Token::Number' );
for (@$nums) {
print $_->content, "\n";
}
__DATA__
# Comment 1
my $time = <STDIN>;
chomp $time;
#now write input to STDOUT
print $time . "\n";
my $pi = 3.1415926;
my $test = -3.22;
my $t = +0.01;
my $range = (8..11);
if $ARGV[0];
sub sample2 {
print "true or false";
return 3 + 4 eq "7"; # true or false
}
Outputs:
3.1415926
-3.22
0.01
8
11
0
3
4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.