Perl, match one pattern multiple times in the same line delimited by unknown characters

Question

I've been able to find similar, but not identical questions to this one. How do I match one regex pattern multiple times in the same line delimited by unknown characters?

For example, say I want to match the pattern HEY. I'd want to recognize all of the following:

HEY

HEY HEY

HEYxjfkdsjfkajHEY

So I'd count 5 HEYs there. So here's my program, which works for everything but the last one:

open ( FH, $ARGV[0]);
while(<FH>)
{
  foreach $w ( split )
  {
      if ($w =~ m/HEY/g)
      {
            $count++;
      }
  }
}

So my question is how do I replace that foreach loop so that I can recognize patterns delimited by weird characters in unknown configurations (like shown in the example above)?

Thanks for the great responses thus far. I just realized I need one other thing though, which I put in a comment below.

One question though: is there any way to save the matched term as well? So like in my case, is there any way to reference $w (say if the regex was more complicated, and I wanted to store it in a hash with the number of occurrences)

So if I was matching a real regex (say a sequence of alphanumeric characters) and wanted to save that in a hash.

Answer 1

One way is to capture all matches of the string and see how many you got. Like so:

open (FH, $ARGV[0]);
while(my $w = <FH>) {
    my @matches = $w =~ m/(HEY)/g;
    my $count = scalar(@matches);
    print "$count\t$w\n";
}

EDIT:

Yes, there is! Just loop over all the matches, and use the capture variables to increment the count in a hash:

my %hash;
open (FH, $ARGV[0]);
while (my $w = <FH>) {
   foreach ($w =~ /(HEY)/g) {
       $hash{$1}++;
   }
}

Answer 2

The problem is you really don't want to call split(). It splits things into words, and you'll note that your last line only has a single "word" (though you won't find it in the dictionary). A word is bounded by white-space and thus is just "everything but whitespace".

What you really want is to continue to do look through each line counting every HEY, starting where you left off each time. Which requires the /g at the end but to keep looking:

while(<>)
{
      while (/HEY/g)
      {
            $count++;
      }
}

print "$count\n";

There is, of course, more than one way to do it but this sticks close to your example. Other people will post other wonderful examples too. Learn from them all!

Answer 3

None of the above answers worked for my similar problem. $1 does not seem to change (perl 5.16.3) so $hash{$1}++ will just count the first match n times.

To get each match, the foreach needs a local variable assigned, which will then contain the match variable. Here's a little script that will match and print each occurrence of (number).

#!/usr/bin/perl -w                                                                                                                    
use strict;
use warnings FATAL=>'all';

my (%procs);
while (<>) {

    foreach my $proc ($_ =~ m/\((\d+)\)/g) {
        $procs{$proc}++;
    }

}

print join("\n",keys %procs) . "\n";

I'm using it like this:

pstree -p | perl extract_numbers.pl | xargs -n 1 echo

(except with some relevant filters in that pipeline). Any pattern capture ought to work as well.

Perl, match one pattern multiple times in the same line delimited by unknown characters

Question

3 answers

solution1
11 ACCPTED 2012-02-06 06:12:34

solution2
6 2012-02-06 06:14:41

solution3
0 2014-02-13 20:45:47

Perl, match one pattern multiple times in the same line delimited by unknown characters

Question

3 answers

solution1 11 ACCPTED 2012-02-06 06:12:34

solution2 6 2012-02-06 06:14:41

solution3 0 2014-02-13 20:45:47

solution1
11 ACCPTED 2012-02-06 06:12:34

solution2
6 2012-02-06 06:14:41

solution3
0 2014-02-13 20:45:47