简体   繁体   中英

Perl: Find a sum of an array recursively with tail call optimization

I'm trying to make a tail optimized recursive function.

sub sum {
    my ($first, @rest) = @_;

    return @rest
        ? $first + sum(@rest)
        : $first;
}

say sum(1 .. 100);

It works for 100 elements however, it fails for 100_000 elements with Out of memory message.

How can improve the code to make the recursion working with more elements?

Edit

Tail call optimized version of the above function:

use feature qw( current_sub );

sub sum_tco {
    my $loop = sub {
        my ($sum, $first, @rest) = @_;
        $sum += $first;

        return @rest
            ? __SUB__->($sum, @rest)
            : $sum;
    };

    return $loop->(@_);
}

It appears that Perl 5 doesn't support TCO.

How to make TCO in Perl (if possible)?

You are correct that Perl doesn't perform tail call optimization.

If you had a tail call, you could optimize it yourself. But that said, you don't have a tail call. The recursive call is followed by an addition.

So let's start by changing the sub to have only tail calls. This is done by passing forward the information needed to perform that last operation.

sub _sum {
   my ($acc, $first, @rest) = @_;
   $acc += $first;
   return @rest ? _sum( $acc, @rest ) : $acc;
}

sub sum {
   my (@rest) = @_;
   return undef if !@rest;
   return _sum( 0, @rest );
}

Now we can perform tail call optimizations.

  1. Place the body of the recursive subroutine in an infinite loop.
  2. Replace recurse(...) with do { @_ = ...; next; } do { @_ = ...; next; } do { @_ = ...; next; } .

First in the helper.

sub _sum {
   while (1) {
      my ($acc, $first, @rest) = @_;
      $acc += $first;
      if (@rest) {
         @_ = ( $acc, @rest );
      } else {
         return $acc;
      }
   }
}

sub sum {
   my (@rest) = @_;
   return undef if !@rest;
   return _sum( 0, @rest );
}

Then in the main sub.

sub sum {
   my (@rest) = @_;
   return undef if !@rest;

   @_ = ( 0, @rest );
   while (1) {
      my ($acc, $first, @rest) = @_;
      $acc += $first;
      if (@rest) {
         @_ = ( $acc, @rest );
      } else {
         return $acc;
      }
   }
}

Done.

...kinda. There are so many other cleanups and optimizations we can do now.

Let's start by improving the flow.

sub sum {
   my (@rest) = @_;
   return undef if !@rest;

   @_ = ( 0, @rest );
   while (1) {
      my ($acc, $first, @rest) = @_;
      $acc += $first;
      return $acc if !@rest;

      @_ = ( $acc, @rest );
   }
}

No need to create a new $acc every pass through the loop.

sub sum {
   my (@rest) = @_;
   return undef if !@rest;

   my $acc = 0;
   while (1) {
      my ($first, @rest) = @_;
      $acc += $first;
      return $acc if !@rest;

      @_ = @rest;
   }
}

There's no need to use @_ anymore.

sub sum {
   my (@rest) = @_;
   return undef if !@rest;

   my $acc = 0;
   while (1) {
      (my $first, @rest) = @rest;
      $acc += $first;
      return $acc if !@rest;
   }
}

Let's replace the expensive list assignment.

sub sum {
   my (@rest) = @_;
   return undef if !@rest;

   my $acc = 0;
   while (1) {
      my $first = shift(@rest);
      $acc += $first;
      return $acc if !@rest;
   }
}

Let's simplify the loop.

sub sum {
   my (@rest) = @_;
   return undef if !@rest;

   my $acc = 0;
   while (@rest) {
      my $first = shift(@rest);
      $acc += $first;
   }

   return $acc;
}

Let's replace the while loop with a cheaper foreach loop.

sub sum {
   my (@rest) = @_;
   return undef if !@rest;

   my $acc = 0;
   for my $first (@rest) {
      $acc += $first;
   }

   return $acc;
}

$first and @rest are no longer appropriate variable names. We'll get rid of a useless copy of @_ in the process.

sub sum {
   return undef if !@_;

   my $acc = 0;
   $acc += $_ for @_;
   return $acc;
}

If we initialize $acc to undef , the initial check is no longer needed.

sub sum {
   my $acc;
   $acc += $_ for @_;
   return $acc;
}

Tada!

Here's a TCO version using that goto feature I mentioned in a comment:

#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;

sub sum {
    return undef if @_ == 0;
    return $_[0] if @_ == 1;
    splice @_, 0, 2, $_[0] + $_[1];
    goto ∑
}

say sum(1..100);
say sum(1..100_000);

From the documentation :

The goto &NAME form is quite different from the other forms of goto. In fact, it isn't a goto in the normal sense at all, and doesn't have the stigma associated with other gotos. Instead, it exits the current subroutine (losing any changes set by local) and immediately calls in its place the named subroutine using the current value of @_

I don't recommend actually using this as it's really, really slow compared to anything else, but it can be done.

Here's a technique using a generic run - recur interface. This is effectively a trampoline -

sub recur (*@values) {
  :{ 'recur' => &recur, 'values' => @values }
}

sub run (&f) {
  my $r = &f();
  while $r.isa(Hash) && $r{'recur'} === &recur {
    $r = &f(|$r{'values'});
  }
  return $r;
}

To use it, we pass a subroutine to run with loop arguments and their initial values -

sub sum ($n = 0) {
  run (sub ($m = $n, $r = 0) {
    if $m == 0 {
      return $r;
    }
    else {
      recur($m - 1, $r + $m);
    }
  })
}

NB we use recur with the updated arguments, instead of calling sum directly. Here's the output -

say sum(100_000);
# 100_000 + 99_999 + 99_997 + ... + 3 + 2 + 1 =
# => 5000050000

# cpu time: 10.61 sec

Here it is working on a range instead. We use loop variables to keep track of the range index, $i and the return value, $r -

sub sum (@range) {
  run (sub ($i = 0, $r = 0) {
    if $i >= @range {
      return $r;
    }
    else {
      recur($i + 1, $r + @range[$i]);
    }
  })
}

say sum(5..10);
# 5 + 6 + 7 + 8 + 9 + 10 =
# => 45

say sum(0..0);
# => 0

say sum(1..100_000);
# => 5000050000

# cpu time: 14.37 sec

The other techniques presented here ask that you dramatically change your program in order to avoid stack overflow. The unique run - recur interface allows you to think about your problem recursively and allow it to run in constant space.


Here is a revision compatible with Perl 5. To my surprise, this program is almost 50 times faster. Maybe the slow down is owed to the poor implementation of newer syntax sugars? It's anyone's guess...

use strict;
use warnings;

sub recur {
  { recur => \&recur, values => \@_ }
}

sub run {
  my ($f, @init) = @_;
  my $r = &{$f}(@init);
  while (ref $r eq ref {} && $r->{'recur'} == \&recur) {
    $r = &{$f}(@{$r->{'values'}});
  }
  return $r;
}

sub sum {
  my ($n) = @_;
  run (sub {
    my ($m, $r) = @_;
    if ($m == 0) {
      return $r;
    }
    else {
      recur($m - 1, $r + $m);
    }
  }, $n, 0);
}

print sum(100_000);
# => 5000050000

# cpu: 0.25 sec
# mem: 3 Mb

And the sum variant that takes a range input -

sub sum {
  my (@range) = @_;
  run (sub {
    my ($i, $r) = @_;
    if ($i >= @range) {
      return $r;
    }
    else {
      recur($i + 1, $r + $range[$i]);
    }
  }, 0, 0);
}

print sum(1..100_000);
# => 5000050000

# cpu: 0.27 sec
# mem: 12 Mb

A file calls itself with a reducing number of arguments, inspired by this post: A simple perl recursion example .

This, of course, is far away from a practical solution.

#!/usr/bin/perl
use strict;
use warnings;
use feature qw(say);

# Init the args at the first call
if (!@ARGV) {exec join(' ', $^X, $0, 1 .. 100_000)}

# Show progress
if (@ARGV % 100 == 0) {say scalar @ARGV}

my ($sum, $first, @rest) = @ARGV;
$sum += $first;

@rest
    ? exec join(' ', $^X, $0, $sum, @rest)
    : say $sum;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM