Perl regex matching numbers

Question

I'm working on a Perl assignment. One of the requirements is to match all integer and float numbers except those in comments or strings (double or single quoted).

Here is my assumption:

Optional sign, integer, and fraction.
If the integer part is omitted, the fraction is mandatory.
If the fraction part is omitted, the decimal dot must be omitted.

And here is the regex I found.

([-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+))

Here is my block of code, I had trouble to exclude numbers in comments and strings so I remove all comments and strings first. I also split lines into words, I believe this should be easier. But I also believe this should not be necessary.

while (<$IN_FILE>) {
    s/^(#[^!]+$)//;            # remove whole line comments
    s{(^[^#]+?)(#[^/]+$)}{$1}; # remove inline comments
    s/('.*?'|".*?")//g;        # remove all single line strings
    push @words, split;        # split line into words
  }

  foreach my $item (<@words>) {
    push @numbers, $1 if $item =~ /([-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+))/;
  }

It worked ok but failed to match array index like the 0 in ARGV[0] .

So I need some help to improve my code, it would be nice if I don't have to remove comments, strings first, don't need to split lines into words, and of course match all the numbers not in comments and strings.

Simple input

# Comment 1
my $time = <STDIN>;
chomp $time;
   #now write input to STDOUT
print $time . "\n";
my $pi = 3.1415926;
my $test = -3.22;
my $t = +0.01;
my $range = (8..11);
if $ARGV[0];
sub sample2 {
   print "true or false";
   return 3 + 4 eq "7"; # true or false
}

Here is the output from my code it missed 0 in ARGV[0] and 11 in (8..11) . I won't be surprised if it misses more.

[Numbers]
3.1415926
-3.22
+0.01
8
2
3
4

Answer 1

The main problem is here:

foreach my $item (<@words>) {

You want to iterate over @words, so no <> are needed. They turn into glob which changes the list you want to iterate over. Just insert

warn "\t$item\n"

into the last loop to see what's being processed.

Even after fixing this, (8..11) will be tokenized into one "word". You match without any /g , so you cannot get more than one number from an item.

Answer 2

As choroba already pointed out, your use of <@words> is an obvious bug.

However, you should simplify things by not breaking your lines into words in the first place and instead use /g to match

use strict;
use warnings;

my @numbers;
while (<DATA>) {
    s/^(#[^!]+$)//;            # remove whole line comments
    s{(^[^#]+?)(#[^/]+$)}{$1}; # remove inline comments
    s/('.*?'|".*?")//g;        # remove all single line strings

    while (/([-+]?([0-9]+(\.[0-9]+)?|\.[0-9]+))/g) {
        push @numbers, $1;
    }
}

print "@numbers";

__DATA__
# Comment 1
my $time = <STDIN>;
chomp $time;
   #now write input to STDOUT
print $time . "\n";
my $pi = 3.1415926;
my $test = -3.22;
my $t = +0.01;
my $range = (8..11);
if $ARGV[0];
sub sample2 {
   print "true or false";
   return 3 + 4 eq "7"; # true or false
}

This will end up pulling too many results. One solution is to add a word boundary before the numbers in the regex:

while (/([-+]?\b([0-9]+(\.[0-9]+)?|\.[0-9]+))\b/g) {

Outputs:

3.1415926 -3.22 +0.01 8 11 0 3 4

The best way to accomplish this is by using PPI though. That is definitely outside of the scope of what your professor is trying to teach you, but to demonstrate:

use strict;
use warnings;

use PPI;

my $src = do {local $/; <DATA>};

# Load a document
my $doc = PPI::Document->new( \$src );

# Find all the barewords within the doc
my $nums = $doc->find( 'PPI::Token::Number' );
for (@$nums) {
    print $_->content, "\n";
}

__DATA__
# Comment 1
my $time = <STDIN>;
chomp $time;
   #now write input to STDOUT
print $time . "\n";
my $pi = 3.1415926;
my $test = -3.22;
my $t = +0.01;
my $range = (8..11);
if $ARGV[0];
sub sample2 {
   print "true or false";
   return 3 + 4 eq "7"; # true or false
}

Outputs:

Perl regex matching numbers

Question

2 answers

solution1
2 ACCPTED 2014-04-13 10:42:08

solution2
1 2014-04-13 22:15:13

Perl regex matching numbers

Question

2 answers

solution1 2 ACCPTED 2014-04-13 10:42:08

solution2 1 2014-04-13 22:15:13

solution1
2 ACCPTED 2014-04-13 10:42:08

solution2
1 2014-04-13 22:15:13