简体   繁体   English

Python vs perl排序性能

[英]Python vs perl sort performance

Solution

This solved all issues with my Perl code (plus extra implementation code.... :-) ) In conlusion both Perl and Python are equally awesome. 这解决了我的Perl代码的所有问题(加上额外的实现代码...... :-))在Perl和Python中同样令人敬畏。

use WWW::Curl::Easy;

Thanks to ALL who responded, very much appreciated. 感谢所有回复的人,非常感谢。

Edit 编辑

It appears that the Perl code I am using is spending the majority of its time performing the http get, for example: 看来我正在使用的Perl代码花费大部分时间来执行http get,例如:

my $start_time = gettimeofday;
$request = HTTP::Request->new('GET', 'http://localhost:8080/data.json');
$response = $ua->request($request);
$page = $response->content;
my $end_time = gettimeofday;
print "Time taken @{[ $end_time - $start_time ]} seconds.\n";

The result is: 结果是:

Time taken 74.2324419021606 seconds.

My python code in comparison: 我的python代码比较:

start = time.time()
r = requests.get('http://localhost:8080/data.json', timeout=120, stream=False)

maxsize = 100000000
content = ''
for chunk in r.iter_content(2048):
    content += chunk
    if len(content) > maxsize:
        r.close()
        raise ValueError('Response too large')

end = time.time()
timetaken = end-start
print timetaken

The result is: 结果是:

20.3471381664

In both cases the sort times are sub second. 在这两种情况下,排序时间都是次秒。 So first of all I apologise for the misleading question, and it is another lesson for me to never ever make assumptions.... :-) 首先,我为这个误导性问题道歉,这是我永远不会做出假设的另一个教训...... :-)

I'm not sure what is the best thing to do with this question now. 我不确定现在对这个问题最好的做法是什么。 Perhaps someone can propose a better way of performing the request in perl? 也许有人可以提出一种在perl中执行请求的更好方法?

End of edit 编辑结束

This is just a quick question regarding sort performance differences in Perl vs Python. 这只是关于Perl与Python中排序性能差异的一个快速问题。 This is not a question about which language is better/faster etc, for the record, I first wrote this in perl, noticed the time the sort was taking, and then tried to write the same thing in python to see how fast it would be. 这不是关于哪种语言更好/更快等等的问题,对于记录,我首先在perl中写这个,注意到排序所花费的时间,然后尝试在python中编写相同的东西,看看它会有多快。 I simply want to know, how can I make the perl code perform as fast as the python code? 我只想知道, 如何使perl代码的执行速度与python代码一样快?

Lets say we have the following json: 让我们说我们有以下json:

["3434343424335": {
        "key1": 2322,
        "key2": 88232,
        "key3": 83844,
        "key4": 444454,
        "key5": 34343543,
        "key6": 2323232
    },
"78237236343434": {
        "key1": 23676722,
        "key2": 856568232,
        "key3": 838723244,
        "key4": 4434544454,
        "key5": 3432323543,
        "key6": 2323232
    }
]

Lets say we have a list of around 30k-40k records which we want to sort by one of the sub keys. 假设我们有一个大约30k-40k记录的列表,我们希望按其中一个子键排序。 We then want to build a new array of records ordered by the sub key. 然后,我们想要构建由子键排序的新记录数组。

Perl - Takes around 27 seconds Perl - 大约需要27秒

my @list;
$decoded = decode_json($page);
foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) {
    push(@list,{"key"=>$id,"key1"=>$decoded->{$id}{key1}...etc));
}

Python - Takes around 6 seconds Python - 大约需要6秒

list = []
data = json.loads(content)
data2 = sorted(data, key = lambda x: data[x]['key5'], reverse=True)

for key in data2:
     tmp= {'id':key,'key1':data[key]['key1'],etc.....}
     list.append(tmp)

For the perl code, I have tried using the following tweaks: 对于perl代码,我尝试使用以下调整:

use sort '_quicksort';  # use a quicksort algorithm
use sort '_mergesort';  # use a mergesort algorithm

Your benchmark is flawed, you're benchmarking multiple variables, not one. 你的基准是有缺陷的,你是对多个变量进行基准测试,而不是一个。 It is not just sorting data, but it is also doing JSON decoding, and creating strings, and appending to an array. 它不仅仅是对数据进行排序,而且还在进行JSON解码,创建字符串以及附加到数组。 You can't know how much time is spent sorting and how much is spent doing everything else. 你不知道排序花了多少时间以及花在做其他事情上花了多少钱。

The matter is made worse in that there are several different JSON implementations in Perl each with their own different performance characteristics. 事情变得更糟,因为Perl中有几个不同的JSON实现,每个实现都有自己不同的性能特征。 Change the underlying JSON library and the benchmark will change again. 更改基础JSON库,基准测试将再次更改。

If you want to benchmark sort, you'll have to change your benchmark code to eliminate the cost of loading your test data from the benchmark, JSON or not. 如果要对排序进行基准测试,则必须更改基准测试代码,以消除从基准测试中加载测试数据的成本。

Perl and Python have their own internal benchmarking libraries that can benchmark individual functions, but their instrumentation can make them perform far less well than they would in the real world. Perl和Python有自己的内部基准测试库,可以对各个函数进行基准测试,但是他们的工具可以使它们的表现远远低于现实世界中的表现。 The performance drag from each benchmarking implementation will be different and might introduce a false bias. 每个基准测试实施的性能拖累都会有所不同,可能会引入误判。 These benchmarking libraries are more useful for comparing two functions in the same program. 这些基准测试库对于比较同一程序中的两个函数更有用。 For comparing between languages, keep it simple. 要比较语言,请保持简单。

Simplest thing to do to get an accurate benchmark is to time them within the program using the wall clock. 获得准确基准测试最简单的方法是使用挂钟在计划内计时。

# The current time to the microsecond.
use Time::HiRes qw(gettimeofday);

my @list;
my $decoded = decode_json($page);

my $start_time = gettimeofday;

foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) {
    push(@list,{"key"=>$id,"key1"=>$decoded->{$id}{key1}...etc));
}

my $end_time = gettimeofday;

print "sort and append took @{[ $end_time - $start_time ]} seconds\n";

(I leave the Python version as an exercise) (我将Python版本作为练习)

From here you can improve your technique. 从这里你可以改进你的技术。 You can use CPU seconds instead of wall clock. 您可以使用CPU秒而不是挂钟。 The array append and cost of creating the string are still involved in the benchmark, they can be eliminated so you're just benchmarking sort. 数组追加和创建字符串的成本仍然包含在基准测试中,它们可以被淘汰,因此您只需对基准测试进行排序。 And so on. 等等。

Additionally, you can use a profiler to find out where your programs are spending their time. 此外,您可以使用分析器找出您的课程花费时间的位置。 These have the same raw performance caveats as benchmarking libraries, the results are only useful to find out what percentage of its time a program is using where, but it will prove useful to quickly see if your benchmark has unexpected drag. 它们具有与基准测试库相同的原始性能警告,结果仅用于查找程序在其中使用的时间百分比,但是对于快速查看基准测试是否具有意外阻力将非常有用。

The important thing is to benchmark what you think you're benchmarking. 重要的是要对您认为自己的基准测试进行基准测试。

Something else is at play here; 还有别的东西在这里发挥作用; I can run your sort in half a second. 我可以在半秒内完成你的排序。 Improving that is not going to depend on sorting algorithm so much as reducing the amount of code run per comparison; 改进不依赖于排序算法,而是减少每次比较的代码运行量; a Schwartzian Transform gets it to a third of a second, a Guttman-Rosler Transform gets it down to a quarter of a second: 一个施瓦茨变换得到它的三分之一秒,一个Guttman-Rosler变换将其降低到四分之一秒:

#!/usr/bin/perl
use 5.014;
use warnings;

my $decoded = { map( (int rand 1e9, { map( ("key$_", int rand 1e9), 1..6 ) } ), 1..40000 ) };

use Benchmark 'timethese';

timethese( -5, {
    'original' => sub {
        my @list;
        foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) {
            push(@list,{"key"=>$id,%{$decoded->{$id}}});
        }
    },
    'st' => sub {
        my @list;
        foreach my $id (
            map $_->[1],
            sort { $b->[0] <=> $a->[0] }
            map [ $decoded->{$_}{key5}, $_ ],
            keys %{$decoded}
        ) {
            push(@list,{"key"=>$id,%{$decoded->{$id}}});
        }
    },
    'grt' => sub {
        my $maxkeylen=15;
        my @list;
        foreach my $id (
            map substr($_,$maxkeylen),
            sort { $b cmp $a }
            map sprintf('%0*s', $maxkeylen, $decoded->{$_}{key5}) . $_,
            keys %{$decoded}
        ) {
            push(@list,{"key"=>$id,%{$decoded->{$id}}});
        }
    },
});

Don't create a new hash for each record. 不要为每条记录创建新的哈希。 Just add the key to the existing one. 只需将密钥添加到现有密钥即可。

$decoded->{$_}{key} = $_
   for keys(%$decoded);

my @list = sort { $b->{key5} <=> $a->{key5} } values(%$decoded);

Using Sort::Key will make it even faster. 使用Sort :: Key会使它更快。

use Sort::Key qw( rukeysort );

$decoded->{$_}{key} = $_
   for keys(%$decoded);

my @list = rukeysort { $_->{key5} } values(%$decoded);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM