简体   繁体   中英

How can I quickly count the maximum number of consecutive single characters in a string?

I have a string similar to: but much longer

my $a = "000000001111111111000000011111111111111111111111111111111";

I am counting the number of "1"'s with:

my $total_1_available = $a =~ tr/1//;

And that works amazingly well and is really FAST.

HOWEVER, I also wish to count (in a fast way), the total number of consecutive 1's in a row. The MAX COUNT of "1's" consecutively.

IN the example above, it would return the count of:

11111111111111111111111111111111

As this is the maximum in a row.

So, I end up with the TOTAL_COUNT and also the TOTAL_CONSECUTIVE_COUNT.

I have it working with a REGEXP which basically replaces the 1's and then counts what was replaced and loops around...which actually is totally fine and works... but it doesn't "feel" right.

Ideally I don't want to replace the string at all as I am looking for the max consecutive count.

But, I know in Perl this probably isn't the fastest or cleanest way.

Could you teach me a better way please and increase my learning?

AS requested this is my current code:

 my $a= "0110011001101111";
 my $total_1_available = $a =~ tr/1//;
 print "Total number of 1's = $total_1_available\n";

 my $max_c = 0;
 while ( $a=~s/(1+)/ / ) {
   $max_c = length($1) if length($1) > $max_c;
 }
 print "Consecutive count   = $max_c\n";

And FINAL CODE:

use strict;
use warnings;
use Benchmark ':all';
use String::Random;

## We test 525,600 as this is the length of the string.
## Actually each 0 or 1 represents a minute of the year.
## And these represent engineer minues available in a 24 hr / 365 day year.
## And there are lots and lots of engineers.
## Hence my wish to improve the performance and I wish to thank everyone whom responded.

## there are a lot more 0's than 1's so hack to sort of simulate
my $test_regex = '[0][0][0][0][0][0-1][0-1][0-1][0-1][0-1]' x 52560;
my $pass       = String::Random->new;
my $string     = $pass->randregex($test_regex);

cmpthese(-1, {
    org  => sub { my $max = 0; while ($string =~ /(1+)/g) { my $len = length($1); if ($max < $len) { $max = $len } } },
    hack => sub { my $match = ""; while ($string =~ /(${match}1+)/g) { $match = $1; } length $match }
});

#                BLOWN AWAY !!!!!!
#                BLOWN AWAY !!!!!!
#                BLOWN AWAY !!!!!!
#                BLOWN AWAY !!!!!!

Remarkable speed improvements can be made with a dynamic regex. We can use a variable to store the max length string, then search for a string that is that long, plus one or more. The theory being that we only need to look for strings longer than the one we already have.

I used a solution that looks like this

sub hack {
    my $match = "";                        # original search string
    while ($string =~ /(${match}1+)/g) {   # search for $match plus 1 or more 1s
        $match = $1;                       # when found, change to new match
    }
    length $match;                         # return max length
}

And compared it to the original method described by the OP, with the following result

use strict;
use warnings;
use Benchmark ':all';

my $string = '0100100101111011010010101101101110101011111111101010100100100001011101010100' x 10_000;

cmpthese(-1, {
    org  => sub { my $max = 0; while ($string =~ /(1+)/g) { my $len = length($1); if ($max < $len) { $max = $len } } },
    hack => sub { my $match = ""; while ($string =~ /(${match}1+)/g) { $match = $1; } length $match }
});

Output:

       Rate    org   hack
org  7.31/s     --   -99%
hack 1372/s 18669%     --

Which seems astonishingly high, 19000% faster. It makes me think I've made a mistake, but I can't think what that would be. Maybe I am missing something in the regex machine internals, but this would be quite the improvement on the original solution.

For short strings, the following is faster than all previously-presented solutions:

use List::Util qw( max );

max 0, map length, split /[^1]+/, $s
          Rate  hack  sort   org   max  mxsp    xs
hack   76879/s    --  -12%  -34%  -37%  -48%  -98%   <-- TLP
sort   87664/s   14%    --  -24%  -28%  -41%  -98%   <-- Jim Davis
org   115660/s   50%   32%    --   -6%  -22%  -98%   <-- OP
max   122504/s   59%   40%    6%    --  -17%  -98%   <-- Jim Davis
mxsp  147867/s   92%   69%   28%   21%    --  -97%   <-- ikegami (above)
xs   4950278/s 6339% 5547% 4180% 3941% 3248%    --   <-- ikegami (below)

Benchmark code:

use Benchmark qw( cmpthese );

my $string = ( '01001001011110110100101011011011101010'
             . '11111111101010100100100001011101010100' );

cmpthese(-3, {
    org  => sub { my $max = 0; while ($string =~ /(1+)/g) { my $len = length($1); if ($max < $len) { $max = $len; } } },
    hack => sub { my $match = ""; while ($string =~ /(${match}1+)/g) { $match = $1; } my $max = length($match); },
    sort => sub { my $max = ( sort { $b <=> $a } $string =~ /(1+)/g )[0]; },
    max  => sub { my $max = max 0, map length, $string =~ /(1+)/g; },
    mxsp => sub { my $max = max 0, map length, split /[^1]+/, $string; },
    xs   => sub { my $max = longuest_ones_count($string); },
});

That said, the fastest solution would involve XS. The following is my stab at it:

IV longuest_ones_count(SV* sv) {
   IV max = 0;
   IV count = 0;

   // This code works whether the string is upgraded or downgraded.
   STRLEN len;
   char *s = SvPV(sv, len);
   while (len--) {
      if (*(s++) == '1') {
         ++count;
      }
      else if (count) {
         if (max < count)
            max = count;

         count = 0;
      }
   }

   if (max < count)
      max = count;

   return max;
}

One way of using it:

use 5.014;
use warnings;

use Inline C => <<'__';

...above code here...

__


say "$_: ", longuest_ones_count($_)
   for qw(
      0
      11111
      011111
      111110
      01110111110
      01111101110
   );

You saw this beat the other solutions out of the water for short strings. But you don't have short strings. For long strings, this is not as fast as TLP's version!!!

Same benchmark as above but using

my $string = ( '01001001011110110100101011011011101010'
             . '11111111101010100100100001011101010100' ) x 10_000;
       Rate   sort    org    max   mxsp     xs   hack
sort 8.61/s     --   -25%   -31%   -44%   -99%   -99%
org  11.6/s    34%     --    -8%   -24%   -99%   -99%
max  12.5/s    46%     9%     --   -18%   -99%   -99%
mxsp 15.3/s    77%    32%    22%     --   -99%   -99%  <-- ikegami (Perl)
xs   1031/s 11870%  8822%  8118%  6653%     --   -25%  <-- ikegami (XS)
hack 1366/s 15772% 11731% 10797%  8855%    33%     --  <-- TLP

Wow the regex engine is good, It can obviously be beat using XS (by eliminating the time needed to compile the pattern)? but what's the point?

I'd probably do something like this:

use List::Util 'max';

my $string = '01011101100000111111001';

my $longest_run = max( 0, map { length } $string =~ /(1+)/g );

That fetches the length of each matched group of 1s and picks the largest. Inserted a 0 so you don't get an undef if there aren't any.

$ perl -MList::Util=max \
  -E 'say $_, " ", max(0, map { length } /(1+)/g) for @ARGV' \
  0 1 00010110 011101111110100110

0 0
1 1
00010110 2
011101111110100110 6

Edit: @TLP's comment made me curious, because I liked the sort solution.

#!/usr/bin/env perl

use v5.16;
use warnings;

use Benchmark ':all';
use List::Util 'max';

my $string = '0100100101111011010010101101101110101011111111101010100100100001011101010100';

cmpthese(1_000_000, {
    sort => sub { my $x =    (  sort { $b <=> $a } $string =~ /(1+)/g)[0] },
    max  => sub { my $x = max(0, map { length    } $string =~ /(1+)/g)    },
});

resulted in:

        Rate sort  max
sort 84890/s   --  -9%
max  93023/s  10%   --

Perhaps longer/shorter test strings would yield different results?

Perl allows you to create hashes on the fly, you can use this to do counting.

loop through each charecter of $a using that letter to increment the contents of the hash. at the end of the loop you will have a hash with keys containing each letter and values containing the count for each one.

foreach $letter (split //, $a) {
    if $letter eq $last { 
        $consecutive_count{$letter}++
    } else {
        if ($consecutive_count{$letter} > $consecutive_runs{$letter})
             $consecutive_runs{$letter} = $consecutive_count{$letter};
             $consecutive_count{$letter} = 0;
        }
    }   
    $counts{$letter}++;
    $last = $letter;    
}

foreach my $key (keys %counts) {
    print "$key occured $counts{$letter} times";
    print "longest consecutive run for $key was $consecutive_runs{$key}";
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM