简体   繁体   中英

Perl: Performance of array-insert using 'splice()' VS linked-list

I have a script in which I use Perl arrays. Each array contains hundreds of thousands of items.

I frequently need to dynamically add items in the middle of an array, or to delete items from it.

I want to understand whether I should use linked-lists instead of the Perl arrays, as I make frequent insertions and deletions

So my questions are:

  • How is splice() implemented?
  • What is the complexity of splice() , when is used for inserting item x into index i in a Perl array
  • Can you recommend on a Perl linked-list module that you've worked with?

Thanks!

Perl arrays are stored as an array of pointers, a beginning offset, a length, and an allocated length.

So inserting or deleting from the middle will require moving 4 or 8 bytes times the number of later elements in the array. Deleting from either end won't require moving anything, just adjusting the beginning offset or length. Inserting at the end will usually just require adjusting the length, but occasionally require reallocating the entire array of pointers. Inserting at the beginning, perl will do its best to arrange so that just the beginning offset will need to be adjusted, but sometimes the entire array will need to be moved or even reallocated.

In practice, the overhead of creating and managing a linked list using perl operations is going to be much greater in almost every case than just using an array.

To benchmark it, we would need to know a lot more about your particular case; what actual size of array, what kind and size of elements (not relevant to the cost of splice, but perhaps relevant to a linked list), relative frequency of inserts/deletes, etc.

Did a quick splicing benchmark and it seems to behave as O(N) for both removals and insertions.

Script:

my $length = shift;
my $toSplice = 100;

my @list = (1 .. $length);

my $t0 = Time::HiRes::time();
for(1 .. $toSplice) {
    my $removeIdx = int(rand() * @list);
    splice @list, $removeIdx, 1;
}

my $t1 = Time::HiRes::time();
for(1 .. $toSplice) {
    my $insertIdx = int(rand() * @list);
    splice @list, $insertIdx, 0, 0;
}

printf("Took %.4fs to remove\n", $t1 - $t0);
printf("Took %.4fs to insert\n", Time::HiRes::time() - $t0);

Results:

$ perl test.pl 100000
Took 0.0026s to remove
Took 0.0092s to insert
$ perl test.pl 1000000
Took 0.0296s to remove
Took 0.0617s to insert
$ perl test.pl 10000000
Took 0.2876s to remove
Took 0.6252s to insert

So increasing the number of iterations by 10x increased the run time by roughly 10x.

Your benchmarking of arrays versus linked list is flawed. The arrays method can be sped up using the following:

  1. Create an array of scalars instead of the superfluous array of hash references to match the linked list.

    This speeds up execution by a factor of 4.

  2. Since you're just doing a single pass of the list, create a new list instead of trying to splice the old one.

    This will increase speed by a factor of 10.

    Of course this doubles your memory, but using the linked list increases it by a factor of 5 at least.

The following are benchmarks showing these two improvements. I also simplified the linked list functionality, but the array method is still twice as fast even with improvements to both.

use strict;
use warnings;

use Benchmark;

my $INSERTION_FREQUENCY = 5;

my $num_of_items = shift or die "Specify size of list\n";

timethese(10, {
    'linked_list'  => sub { linked_list($num_of_items) },
#   'array_splice' => sub { array_splice($num_of_items) },
    'array_map'    => sub { array_map($num_of_items) },
});

sub linked_list {
    my $count = shift;

    my $curr_node = my $list_head = {data => 1};

    # Creating List 
    for my $i (2 .. $num_of_items) {
        $curr_node = $curr_node->{next} = {
            data => $i,
            prev => $curr_node,
        };
    }

    # Inserting Items
    $curr_node = $list_head;
    my $i = 0;
    while ($curr_node) {
        if (++$i % $INSERTION_FREQUENCY == 0) {
            my %new_node = (
                data => "inserted",
                prev => $curr_node->{"prev"},
                next => $curr_node,
            );
            $curr_node->{"prev"}{"next"} = \%new_node if $curr_node->{"prev"};
            $curr_node->{"prev"} = \%new_node;
        }
        $curr_node = $curr_node->{"next"};
    }

    return $list_head;
}

sub array_splice {
    my $num_of_items = shift;

    # Creating Array
    my @array = (1..$num_of_items);

    # Inserting Items
    for my $i (1 .. $num_of_items) {
        if ($i % $INSERTION_FREQUENCY == 0) {
            splice(@array, $i - 1, 0, "inserted");
        }
    }

    return \@array;
}

sub array_map {
    my $num_of_items = shift;

    # Creating Array
    my @array = (1..$num_of_items);

    # Inserting Items
    my $i = 0;
    @array = map {
        ++$i % $INSERTION_FREQUENCY == 0 ? ("inserted", $_) : $_
    } @array;

    return \@array;
}

Benchmarks

$ perl arrays.pl 100000
Benchmark: timing 10 iterations of array_map, array_splice, linked_list...
 array_map:  1 wallclock secs ( 0.58 usr +  0.01 sys =  0.59 CPU) @ 16.89/s (n=10)
array_splice: 16 wallclock secs (16.21 usr +  0.00 sys = 16.21 CPU) @  0.62/s (n=10)
linked_list:  2 wallclock secs ( 1.43 usr +  0.09 sys =  1.53 CPU) @  6.54/s (n=10)

$ perl arrays.pl 200000
Benchmark: timing 10 iterations of array_map, array_splice, linked_list...
 array_map:  1 wallclock secs ( 1.20 usr +  0.05 sys =  1.25 CPU) @  8.01/s (n=10)
array_splice: 64 wallclock secs (64.10 usr +  0.03 sys = 64.13 CPU) @  0.16/s (n=10)
linked_list:  3 wallclock secs ( 2.92 usr +  0.23 sys =  3.15 CPU) @  3.17/s (n=10)

$ perl arrays.pl 500000
Benchmark: timing 10 iterations of array_map, linked_list...
 array_map:  4 wallclock secs ( 3.12 usr +  0.36 sys =  3.48 CPU) @  2.87/s (n=10)
linked_list:  8 wallclock secs ( 7.52 usr +  0.70 sys =  8.22 CPU) @  1.22/s (n=10)

I've also made a benchmark and wanted to share the results with you.

In the results I got, linked-list is by-far faster that Perl arrays.

This is the benchmark I've done:

  1. Created a linked-list or an array with 1M items
  2. Iterated over the list/array and made 200K insertions in place
  3. Checked how much time each scenario took.

Linked-list: 2sec
Perl-array: 1:55min

I share the code with you:

run commands and results:

> time perl_benchmark.pl list 1000000
1.876u 0.124s 0:02.01 99.0%     0+0k 0+0io 0pf+0w
> time perl_benchmark.pl array 1000000
115.159u 0.104s 1:55.27 99.9%   0+0k 0+0io 0pf+0w

Source code:

my $INSERTION_FREQUENCY = 5;

my $use_list = $ARGV[0] eq "list";
my $num_of_items = $ARGV[1];

my $list_header;
my $list_tail;

my @array;

# Creating List or Array
for (my $i = 0 ; $i < $num_of_items ; $i++) {
    my %new_node;
    $new_node{"data"} = $i;
    if ($use_list) {        
        if (! defined($list_header)) {
            $list_header = $list_tail = \%new_node;
        } else {
            $new_node{"prev"} = $list_tail;
            $list_tail->{"next"} = \%new_node;          
            $list_tail = \%new_node;
        }
    } else {
        push(@array, \%new_node);
    }
}

# Inserting Items
my $curr_node = $list_header;
for (my $i = 1 ; $i < $num_of_items ; $i++) {
    if ($i % $INSERTION_FREQUENCY == 0) {
        my %new_node;
        $new_node{"data"} = "inserted";
        if ($use_list) {
            my $prev_ptr = $curr_node->{"prev"};
            if (defined($prev_ptr)) {
                $prev_ptr->{"next"} = \%new_node;
            }
            $new_node{"prev"} = $prev_ptr;
            $new_node{"next"} = $curr_node;
            $curr_node->{"prev"} = \%new_node

        } else {
            splice(@array, $i - 1, 0, \%new_node);
        }
    }
    if ($use_list) {
        $curr_node = $curr_node->{"next"};
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM