简体   繁体   English

使用散列和散列引用的Perl速度比较

[英]Perl speed comparison on using hash and hash reference

I am rying to compare if it is better to use hashes or reference to hashes, hash ref as I understand just a pointer to the hash itself, so I thought should be no speed difference. 我很想比较是否更好地使用哈希或引用哈希,哈希引用,因为我理解只是一个指向哈希本身的指针,所以我认为应该没有速度差异。

I did a basic benchmark and I found the hash refs are slower than using the hash direct by average 20-27%. 我做了一个基本的基准测试,我发现哈希引用比使用哈希直接平均速度慢20-27%。

Here is the basic benchmark code I used: 这是我使用的基本基准代码:

use Benchmark qw(:all);

cmpthese(10_000_000, {
    hash            =>  sub { my %hash = (); },
    hashref =>  sub { my $hahref = {}; }
});

cmpthese(10_000_000, {
    hash            =>  sub {
                                    my %hash;
                                    $hash{fname}="first name";
                                    $hash{mname}="middle name";
                                    $hash{lname}="last name";
                                },

    hashref =>  sub {
                                    my $hahref;
                                    $hahref->{fname}="first name"; 
                                    $hahref->{mname}="middle name";
                                    $hahref->{lname}="last name"; 
                                },

    hashrefs    =>  sub {
                                    my $hahref;
                                    $$hahref{fname}="first name"; 
                                    $$hahref{mname}="middle name";
                                    $$hahref{lname}="last name"; 
                                },
});

and here is the benchmark results on laptop dell, windows 8, core i7, 16MB RAM: 这是笔记本电脑戴尔,Windows 8,核心i7,16MB RAM的基准测试结果:

             Rate hashref    hash
hashref 5045409/s      --    -17%
hash    6045949/s     20%      --

             Rate hashrefs  hashref     hash
hashrefs 615764/s       --      -2%     -21%
hashref  625978/s       2%       --     -19%
hash     775134/s      26%      24%       --

Output completed (1 min 6 sec consumed)

My question is, if my benchmark is correct and the hash refs are so slow, why most modules like DBI use hash refs to return results. 我的问题是,如果我的基准测试是正确的并且散列引用速度太慢,为什么像DBI这样的大多数模块都使用散列引用来返回结果。 Also most modules accepts hash refs not hashes for arguments and also return hash refs and not hashes. 此外,大多数模块接受散列引用而不是参数的散列,并且还返回散列引用而不是散列。

Hashes are faster to access elements from; 哈希更快地访问元素; hashrefs are faster to pass as arguments to a function, or return as the result of a function. hashrefs作为参数传递给函数的速度更快,或者作为函数的结果返回。 This makes sense if you think about it: 如果你仔细想想,这是有道理的:

  • A hashref is basically a pointer to a hash, so when Perl sees $href->{xyz} , it needs to follow the pointer to find the hash, and then find element xyz in the hash. hashref基本上是指向散列的指针,因此当Perl看到$href->{xyz} ,它需要跟随指针找到散列,然后在散列中找到元素xyz When Perl sees $hash{xyz} it doesn't need to do that initial pointer-following bit; 当Perl看到$hash{xyz}它不需要执行该初始指针跟随位; it can find element xyz straight away. 它可以直接找到元素xyz

  • Hashes cannot be directly passed to or from subs; 哈希不能直接传递给潜艇; they need to be flattened into a list of scalars. 他们需要被扁平化为一系列标量。 If a hash has, say four keys and four values, then passing it to a sub means passing a list of eight scalars to the function. 如果哈希有四个键和四个值,那么将它传递给子意味着将八个标量列表传递给该函数。 Inside the function, you'll probably have something like my %args = @_ which copies those eight scalars into a new hash. 在函数内部,你可能会有像my %args = @_这样的东西,它将这8个标量复制到一个新的散列中。 Lots of work to be done. 有很多工作要做。 Passing a hashref is just a matter of passing a single scalar, so it's faster. 传递hashref只是传递单个标量的问题,所以它更快。

Mostly this is micro-optimization, and you should just choose whichever data structure makes the most sense for your program. 这主要是微优化,您应该选择哪种数据结构对您的程序最有意义。 However for those occasions when you really need to eke out every bit of speed you can, it is possible to have the best of both worlds... 然而,对于那些你真的需要尽可能地提高速度的场合,它可以拥有两全其美......

Let's say you have a function which needs to accept a hash (or maybe a hashref; you haven't decided yet) and needs to add up some of the keys. 假设您有一个需要接受哈希值的函数(或者可能是hashref;您还没有决定)并且需要添加一些键。 Here are your original two options: 以下是您最初的两个选项:

sub add_hash {
    my %hash = @_;
    return $hash{foo} + $hash{bar} + $hash{baz};
}

sub add_hashref {
    my ($href) = @_;                                    # faster than add_hash
    return $href->{foo} + $href->{bar} + $href->{baz};  # slower than add_hash
}

Now let's pull out Data::Alias . 现在让我们拉出Data :: Alias This is a module that allows us to create a lexical variable which acts as an alias for another. 这个模块允许我们创建一个词法变量,作为另一个变量的别名。 In particular, we can make a lexical hash variable which acts like an alias for the hash which is pointed to by a hashref. 特别是,我们可以创建一个词法哈希变量,它的作用类似于hashref指向的哈希的别名。

use Data::Alias;

sub add_hashref_2 {
    my ($href) = @_;                               # faster than add_hash
    alias my %hash = %$href;                       # ... magic happens ...
    return $hash{foo} + $hash{bar} + $hash{baz};   # faster than add_hashref
}

Or better still: 或者更好的是:

use Data::Alias;

sub add_hashref_3 {
    alias my %hash = %{ $_[0] };
    return $hash{foo} + $hash{bar} + $hash{baz};
}

... which avoids the initial list assignment. ...这避免了初始列表分配。

I stress that this is micro -optimization. 我强调这是微观优化。 There are usually far better ways to speed up your code - memoization, radical algorithm changes, rewriting selected hot code paths in XS, etc. But there are some (very limited) occasions when this sort of magic can help. 通常有更好的方法来加速你的代码 - 记忆,激进的算法更改,在XS中重写选定的热代码路径等等。但是有一些(非常有限的)场合,这种魔法可以帮助。

Your benchmark is faulty. 您的基准测试有问题。

Your hashref examples not only use a hash, but also create it for each iteration. 您的hashref示例不仅使用散列,还为每次迭代创建它。 The hash examples are optimized to always reuse the same hash. 优化散列示例以始终重用相同的散列。

If you amend your second benchmark to force the simple hash version to always create a new hash, the hashref version becomes faster: 如果修改第二个基准测试以强制简单哈希版本始终创建新哈希,则hashref版本会变得更快:

cmpthese(10_000_000, {
    hash => sub {
        my %hash;
        $hash{fname}="first name";
        $hash{mname}="middle name";
        $hash{lname}="last name";
        return \%hash;          
    },
    hashref => sub {
        my $hahref;
        $hahref->{fname}="first name"; 
        $hahref->{mname}="middle name";
        $hahref->{lname}="last name"; 
        return $hahref;         
    },
} );

But the real point here is stop trying to microoptimize; 但真正的观点是停止尝试微观优化; write your code the way that makes sense, and only when there proves to be a problem look narrowly at the actually ill-performing code for optimization. 以有意义的方式编写代码,并且只有在证明存在问题的情况下,才能看到实际上不良的代码以进行优化。

Of course accessing a hash element through a reference will be slower than accessing a hash element directly. 当然,通过引用访问哈希元素比直接访问哈希元素要慢。 Extra work requires extra time. 额外的工作需要额外的时间

But how much longer does it take? 但需要多长时间? According to your tests, 根据你的测试,

( 1 / (5045409/s) - 1/(6045949/s) ) / 3 derefs
= 0.000,000,011 s/deref
= 11 ns/deref

This isn't what you should be worrying about! 这不是你应该担心的!

if my benchmark is correct and the hash refs are so slow 如果我的基准测试是正确的,并且散列引用是如此之慢

Your benchmark doesn't show them being slow. 您的基准测试并未表明它们很慢。

why most modules like DBI use hash refs to return results. 为什么像DBI这样的大多数模块都使用hash refs来返回结果。

As opposed to what? 与什么相反? The only thing a sub can return is a list of scalars. sub可以返回的唯一内容是标量列表。 It can't return a hash. 它无法返回哈希值。 fetch_hashref could return a list of key-value pairs from which you could build a hash, but that would be far slower than using a reference if it had already built the hash in the sub. fetch_hashref可以返回一个键值对列表,您可以从中创建一个哈希值,但如果它已经在子fetch_hashref构建了哈希,那么它将比使用引用要慢得多。

As I understand it (which could likely be misleading), the biggest advantage of returning a reference vs the actual data structure is in the assignment outside of the subroutine -- not the performance of accessing the structure. 据我所知(这可能会产生误导),返回引用与实际数据结构的最大优点是在子例程之外的赋值 - 而不是访问结构的性能。 Returning a reference will not copy the data in memory on the assignment. 返回引用不会复制赋值中的内存中的数据。

I would expect the second example is probably slower. 我希望第二个例子可能更慢。

my $data = getData();

sub getData {
    return { a => '1' };
}

vs VS

my %data = getData();

sub getData {
    return my %hash = ( a => '1' );
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM