简体   繁体   English

Python比编译的Haskell快吗?

[英]Python faster than compiled Haskell?

I have a simple script written in both Python and Haskell. 我有一个用Python和Haskell编写的简单脚本。 It reads a file with 1,000,000 newline separated integers, parses that file into a list of integers, quick sorts it and then writes it to a different file sorted. 它读取具有1,000,000个换行符分隔的整数的文件,将该文件解析为整数列表,对其进行快速排序,然后将其写入另一个已排序的文件中。 This file has the same format as the unsorted one. 该文件与未排序文件的格式相同。 Simple. 简单。

Here is Haskell: 这是Haskell:

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        lesser  = filter (< p) xs
        greater = filter (>= p) xs

main = do
    file <- readFile "data"
    let un = lines file
    let f = map (\x -> read x::Int ) un
    let done = quicksort f
    writeFile "sorted" (unlines (map show done))

And here is Python: 这是Python:

def qs(ar):
    if len(ar) == 0:
        return ar

    p = ar[0]
    return qs([i for i in ar if i < p]) + [p] + qs([i for i in ar if i > p])


def read_file(fn):
    f = open(fn)
    data = f.read()
    f.close()
    return data

def write_file(fn, data):
    f = open('sorted', 'w')
    f.write(data)
    f.close()


def main():
    data = read_file('data')

    lines = data.split('\n')
    lines = [int(l) for l in lines]

    done = qs(lines)
    done = [str(l) for l in done]

    write_file('sorted', "\n".join(done))

if __name__ == '__main__':
    main()

Very simple. 很简单。 Now I compile the Haskell code with 现在我用以下代码编译Haskell代码

$ ghc -O2 --make quick.hs

And I time those two with: 我给这两个时间计时:

$ time ./quick
$ time python qs.py

Results: 结果:

Haskell: Haskell:

real    0m10.820s
user    0m10.656s
sys 0m0.154s

Python: 蟒蛇:

real    0m9.888s
user    0m9.669s
sys 0m0.203s

How can Python possibly be faster than native code Haskell? Python如何比本地代码Haskell更快?

Thanks 谢谢

EDIT : 编辑

  • Python version: 2.7.1 Python版本:2.7.1
  • GHC version: 7.0.4 GHC版本:7.0.4
  • Mac OSX, 10.7.3 Mac OSX,10.7.3
  • 2.4GHz Intel Core i5 2.4GHz英特尔酷睿i5

List generated by 清单产生者

from random import shuffle
a = [str(a) for a in xrange(0, 1000*1000)]
shuffle(a)
s = "\n".join(a)
f = open('data', 'w')
f.write(s)
f.close()

So all numbers are unique. 因此,所有数字都是唯一的。

The Original Haskell Code 原始的Haskell密码

There are two issues with the Haskell version: Haskell版本存在两个问题:

  • You're using string IO, which builds linked lists of characters 您正在使用字符串IO,该字符串将构建链接的字符列表
  • You're using a non-quicksort that looks like quicksort. 您正在使用看起来像快速排序的非快速排序。

This program takes 18.7 seconds to run on my Intel Core2 2.5 GHz laptop. 在我的Intel Core2 2.5 GHz笔记本电脑上运行该程序需要18.7秒。 (GHC 7.4 using -O2) (GHC 7.4使用-O2)

Daniel's ByteString Version Daniel的ByteString版本

This is much improved, but notice it still uses the inefficient built-in merge sort. 对此进行了很大的改进,但是请注意,它仍然使用效率低下的内置合并排序。

His version takes 8.1 seconds (and doesn't handle negative numbers, but that's more of a non-issue for this exploration). 他的版本需要8.1秒的时间(并且不会处理负数,但这对于本次探索来说不是更多问题)。

Note 注意

From here on this answer uses the following packages: Vector , attoparsec , text and vector-algorithms . 从这里开始,此答案使用以下程序包: Vectorattoparsectextvector-algorithms Also notice that kindall's version using timsort takes 2.8 seconds on my machine (edit: and 2 seconds using pypy). 还要注意,使用timsort的kindall的版本在我的计算机上需要2.8秒(编辑:使用pypy需要2秒)。

A Text Version 文字版本

I ripped off Daniel's version, translated it to Text (so it handles various encodings) and added better sorting using a mutable Vector in an ST monad: 我剥夺了Daniel的版本,将其翻译为Text(以便处理各种编码),并在ST monad中使用可变的Vector添加了更好的排序:

import Data.Attoparsec.Text.Lazy
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.IO as TIO
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Algorithms.Intro as I
import Control.Applicative
import Control.Monad.ST
import System.Environment (getArgs)

parser = many (decimal <* char '\n')

main = do
    numbers <- TIO.readFile =<< fmap head getArgs
    case parse parser numbers of
        Done t r | T.null t -> writeFile "sorted" . unlines
                                                  . map show . vsort $ r
        x -> error $ Prelude.take 40 (show x)

vsort :: [Int] -> [Int]
vsort l = runST $ do
        let v = V.fromList l
        m <- V.unsafeThaw v
        I.sort m
        v' <- V.unsafeFreeze m
        return (V.toList v')

This runs in 4 seconds (and also doesn't handle negatives) 这会在4秒钟内运行(并且不会处理否定词)

Return to the Bytestring 返回字节串

So now we know we can make a more general program that's faster, what about making the ASCii -only version fast? 因此,现在我们知道可以制作一个更快的更通用的程序,如何使仅ASCii的版本更快呢? No problem! 没问题!

import qualified Data.ByteString.Lazy.Char8 as BS
import Data.Attoparsec.ByteString.Lazy (parse,  Result(..))
import Data.Attoparsec.ByteString.Char8 (decimal, char)
import Control.Applicative ((<*), many)
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Algorithms.Intro as I
import Control.Monad.ST


parser = many (decimal <* char '\n')

main = do
    numbers <- BS.readFile "rands"
    case parse parser numbers of
        Done t r | BS.null t -> writeFile "sorted" . unlines
                                                   . map show . vsort $ r

vsort :: [Int] -> [Int]
vsort l = runST $ do
        let v = V.fromList l
        m <- V.unsafeThaw v
        I.sort m
        v' <- V.unsafeFreeze m
        return (V.toList v')

This runs in 2.3 seconds. 运行时间为2.3秒。

Producing a Test File 产生测试文件

Just in case anyone's curious, my test file was produced by: 以防万一有人好奇,我的测试文件是由以下人员产生的:

import Control.Monad.CryptoRandom
import Crypto.Random
main = do
  g <- newGenIO :: IO SystemRandom
  let rs = Prelude.take (2^20) (map abs (crandoms g) :: [Int])
  writeFile "rands" (unlines $ map show rs)

If you're wondering why vsort isn't packaged in some easier form on Hackage... so am I. 如果您想知道为什么vsort不能以某种更简单的形式打包在Hackage上...我也是。

In short, don't use read . 简而言之,不要使用read Replace read with a function like this: 用以下函数替换read

import Numeric

fastRead :: String -> Int
fastRead s = case readDec s of [(n, "")] -> n

I get a pretty fair speedup: 我得到了相当不错的加速:

~/programming% time ./test.slow
./test.slow  9.82s user 0.06s system 99% cpu 9.901 total
~/programming% time ./test.fast
./test.fast  6.99s user 0.05s system 99% cpu 7.064 total
~/programming% time ./test.bytestring
./test.bytestring  4.94s user 0.06s system 99% cpu 5.026 total

Just for fun, the above results include a version that uses ByteString (and hence fails the "ready for the 21st century" test by totally ignoring the problem of file encodings) for ULTIMATE BARE-METAL SPEED. 只是为了好玩,上述结果包括使用ByteString的版本(因此完全忽略了文件编码问题,因此未能通过“面向21世纪的就绪”测试),适用于ULTIMATE BARE-METAL SPEED。 It also has a few other differences; 它还有一些其他差异。 for example, it ships out to the standard library's sort function. 例如,它附带了标准库的排序功能。 The full code is below. 完整的代码如下。

import qualified Data.ByteString as BS
import Data.Attoparsec.ByteString.Char8
import Control.Applicative
import Data.List

parser = many (decimal <* char '\n')

reallyParse p bs = case parse p bs of
    Partial f -> f BS.empty
    v -> v

main = do
    numbers <- BS.readFile "data"
    case reallyParse parser numbers of
        Done t r | BS.null t -> writeFile "sorted" . unlines . map show . sort $ r

More a Pythonista than a Haskellite, but I'll take a stab: 与Haskellite相比,它更像一个Pythonista,但我会刺一针:

  1. There's a fair bit of overhead in your measured runtime just reading and writing the files, which is probably pretty similar between the two programs. 在测量的运行时中,仅读取和写入文件就有相当大的开销,这在两个程序之间可能非常相似。 Also, be careful that you've warmed up the cache for both programs. 另外,请注意,您已经预热了这两个程序的缓存。

  2. Most of your time is spent making copies of lists and fragments of lists. 您的大部分时间都花在制作列表的副本和列表的片段上。 Python list operations are heavily optimized, being one of the most-frequently used parts of the language, and list comprehensions are usually pretty performant too, spending much of their time in C-land inside the Python interpreter. Python列表操作经过了高度优化,是该语言中最常用的部分之一,列表理解通常也很不错,它们大部分时间都花在Python解释器的C-land中。 There is not a lot of the stuff that is slowish in Python but wicked fast in static languages, such as attribute lookups on object instances. 在Python中没有很多东西很慢,但是在静态语言中却没有那么快,例如在对象实例上进行属性查找。

  3. Your Python implementation throws away numbers that are equal to the pivot, so by the end it may be sorting fewer items, giving it an obvious advantage. 您的Python实现会丢掉等于支点的数字,因此最终它可以排序的项目更少,这给它带来了明显的优势。 (If there are no duplicates in the data set you're sorting, this isn't an issue.) Fixing this bug probably requires making another copy of most of the list in each call to qs() , which would slow Python down a little more. (如果要排序的数据集中没有重复项,这不是问题。)要修复此错误,可能需要在每次调用qs()时为列表的大多数内容再创建一个副本,这会使Python变慢再多一点。

  4. You don't mention what version of Python you're using. 您没有提到要使用的Python版本。 If you're using 2.x, you could probably get Haskell to beat Python just by switching to Python 3.x. 如果您使用的是2.x,则可能只需切换到Python 3.x,即可使Haskell击败Python。 :-) :-)

I'm not too surprised the two languages are basically neck-and-neck here (a 10% difference is not noteworthy). 我不太惊讶这两种语言在这里基本上是并驾齐驱的(相差10%并不值得注意)。 Using C as a performance benchmark, Haskell loses some performance for its lazy functional nature, while Python loses some performance due to being an interpreted language. 使用C作为性能基准,Haskell因其懒惰的功能特性而失去了一些性能,而Python由于是一种解释语言而失去了一些性能。 A decent match. 一场不错的比赛。

Since Daniel Wagner posted an optimized Haskell version using the built-in sort , here's a similarly optimized Python version using list.sort() : 由于Daniel Wagner使用内置的sort发布了优化的Haskell版本,因此这是使用list.sort()进行类似优化的Python版本:

mylist = [int(x.strip()) for x in open("data")]
mylist.sort()
open("sorted", "w").write("\n".join(str(x) for x in mylist))

3.5 seconds on my machine, vs. about 9 for the original code. 我的机器上需要3.5秒,而原始代码大约需要9秒。 Pretty much still neck-and-neck with the optimized Haskell. 与优化的Haskell并驾齐驱。 Reason: it's spending most of its time in C-programmed libraries. 原因:它将大部分时间都花在C程序库中。 Also, TimSort (the sort used in Python) is a beast. 另外,TimSort(Python中使用的排序)是一种野兽。

This is after the fact, but I think most of the trouble is in the Haskell writing. 这是事实,但我认为大多数麻烦都在于Haskell写作中。 The following module is pretty primitive -- one should use builders probably and certainly avoid the ridiculous roundtrip via String for showing -- but it is simple and did distinctly better than pypy with kindall's improved python and better than the 2 and 4 sec Haskell modules elsewhere on this page (it surprised me how much they were using lists, so I made a couple more turns of the crank.) 以下模块是非常原始的模块-应该使用构建器,并且肯定会避免通过String进行的可笑的来回显示-但它很简单,并且比使用pyall改进的python的pypy更好,并且比其他地方的2和4秒的Haskell模块要好在此页面上(令我惊讶的是他们使用了多少列表,所以我又曲了几圈。)

$ time aa.hs        real    0m0.709s
$ time pypy aa.py   real    0m1.818s
$ time python aa.py real    0m3.103s

I'm using the sort recommended for unboxed vectors from vector-algorithms. 我正在使用推荐用于vector-algorithms中的未装箱矢量的排序。 The use of Data.Vector.Unboxed in some form is clearly now the standard, naive way of doing this sort of thing -- it's the new Data.List (for Int, Double, etc.) Everything but the sort is irritating IO management, which could I think still be massively improved, on the write end in particular. 现在,以某种形式使用Data.Vector.Unboxed显然是处理这种事情的标准,简单的方法-这是新的Data.List(用于Int,Double等)。除了sort ,所有其他事情都激怒了IO管理,尤其是在写入方面,我认为仍可以进行很大的改进。 The reading and sorting together take about 0.2 sec as you can see from asking it to print what's at a bunch of indexes instead of writing to file, so twice as much time is spent writing as in anything else. 从要求它打印一堆索引中的内容而不是写入文件中可以看到,读取和排序在一起大约需要0.2秒,因此,与其他任何东西相比,花费的时间是其两倍。 If the pypy is spending most of its time using timsort or whatever, then it looks like the sorting itself is surely massively better in Haskell, and just as simple -- if you can just get your hands on the darned vector... 如果pypy大部分时间都在使用timsort或其他方法,那么看起来在Haskell中排序本身肯定要好得多,而且也很简单-如果您可以动手使用变暗的矢量...

I'm not sure why there aren't convenient functions around for reading and writing vectors of unboxed things from natural formats -- if there were, this would be three lines long and would avoid String and be much faster, but maybe I just haven't seen them. 我不确定为什么没有便利的功能来读取和写入自然格式的未装箱的东西的向量-如果有的话,这将是三行,并且会避免使用String并且速度更快,但也许我只是避而远之没有看到他们。

import qualified Data.ByteString.Lazy.Char8 as BL
import qualified Data.ByteString.Char8 as B
import qualified Data.Vector.Unboxed.Mutable as M
import qualified Data.Vector.Unboxed as V
import Data.Vector.Algorithms.Radix 
import System.IO

main  = do  unsorted <- fmap toInts (BL.readFile "data")
            vec <- V.thaw unsorted
            sorted <- sort vec >> V.freeze vec
            withFile "sorted" WriteMode $ \handle ->
               V.mapM_ (writeLine handle) sorted

writeLine :: Handle -> Int -> IO ()
writeLine h int = B.hPut h $ B.pack (show int ++ "\n")

toInts :: BL.ByteString -> V.Vector Int
toInts bs = V.unfoldr oneInt (BL.cons ' ' bs) 

oneInt :: BL.ByteString -> Maybe (Int, BL.ByteString)
oneInt bs = if BL.null bs then Nothing else 
               let bstail = BL.tail bs
               in if BL.null bstail then Nothing else BL.readInt bstail

To follow up @kindall interesting answer, those timings are dependent from both the python / Haskell implementation you use, the hardware configuration on which you run the tests, and the algorithm implementation you right in both languages. 要跟踪@kindall有趣的答案,这些时间取决于您使用的python / Haskell实现,运行测试所基于的硬件配置以及您所使用的两种语言的算法实现。

Nevertheless we can try to get some good hints of the relative performances of one language implementation compared to another, or from one language to another language. 不过,我们可以尝试从某种语言实现相对于另一种实现,或者从一种语言转换为另一种语言的相对性能方面获得一些好的提示。 With well known alogrithms like qsort, it's a good beginning. 对于像qsort这样的知名alogrithms,这是一个好的开始。

To illustrate a python/python comparison, I just tested your script on CPython 2.7.3 and PyPy 1.8 on the same machine: 为了说明python / python的比较,我只是在同一台机器上的CPython 2.7.3和PyPy 1.8上测试了您的脚本:

  • CPython: ~8s CPython:〜8秒
  • PyPy: ~2.5s PyPy:约2.5秒

This shows there can be room for improvements in the language implementation, maybe compiled Haskell is not performing at best the interpretation and compilation of your corresponding code. 这表明在语言实现上可能还有改进的余地,也许已编译的Haskell最多没有执行相应代码的解释和编译。 If you are searching for speed in Python, consider also to switch to pypy if needed and if your covering code permits you to do so. 如果要在Python中搜索速度,还可以考虑在需要时并且在覆盖代码允许的情况下切换到pypy。

i noticed some problem everybody else didn't notice for some reason; 我注意到一些其他人由于某种原因没有注意到的问题; both your haskell and python code have this. 您的haskell和python代码都具有此功能。 (please tell me if it's fixed in the auto-optimizations, I know nothing about optimizations). (请告诉我它是否在自动优化中得到了解决,但我对优化一无所知)。 for this I will demonstrate in haskell. 为此,我将在haskell中进行演示。 in your code you define the lesser and greater lists like this: 在代码中,您可以定义较小和较大的列表,如下所示:

where lesser = filter (<p) xs
      greater = filter (>=p) xs

this is bad, because you compare with p each element in xs twice, once for getting in the lesser list, and again for getting in the greater list. 这很不好,因为您将xs中的每个元素与p进行了两次比较,一次是进入较小列表,另一次是进入较大列表。 this (theoretically; I havn't checked timing) makes your sort use twice as much comparisons; 这(理论上;我没有检查时间)使您的排序使用两倍的比较; this is a disaster. 这是一场灾难。 instead, you should make a function which splits a list into two lists using a predicate, in such a way that 相反,您应该创建一个函数,使用谓词将一个列表分为两个列表,这样

split f xs

is equivalent to 相当于

(filter f xs, filter (not.f) xs)

using this kind of function you will only need to compare each element in the list once to know in which side of the tuple to put it. 使用这种功能,您只需比较列表中的每个元素一次,即可知道将其放在元组的哪一侧。
okay, lets do it: 好吧,让我们做吧:

where
    split :: (a -> Bool) -> [a] -> ([a], [a])
    split _ [] = ([],[])
    split f (x:xs)
        |f x       = let (a,b) = split f xs in (x:a,b)
        |otherwise = let (a,b) = split f xs in (a,x:b)

now lets replace the lesser/greater generator with 现在让我们用替换较小/较大的生成器

let (lesser, greater) = split (p>) xs in (insert function here)

full code: 完整代码:

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) =
    let (lesser, greater) = splitf (p>) xs
    in (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        splitf :: (a -> Bool) -> [a] -> ([a], [a])
        splitf _ [] = ([],[])
        splitf f (x:xs)
            |f x       = let (a,b) = splitf f xs in (x:a,b)
            |otherwise = let (a,b) = splitf f xs in (a,x:b)

for some reason I can't right the getter/lesser part in the where clauses so I had to right it in let clauses. 由于某种原因,我无法在where子句中纠正getter / lesser部分,因此我不得不在let子句中纠正它。 also, if it is not tail-recursive let me know and fix it for me (I don't know yet how tail-recorsive works fully) 另外,如果不是尾递归,请告诉我并为我修复(我不知道尾递归如何充分发挥作用)

now you should do the same for the python code. 现在您应该对python代码执行相同的操作。 I don't know python so I can't do it for you. 我不懂python,所以我不能为你做。

EDIT: there actually happens to already be such function in Data.List called partition. 编辑:实际上,Data.List中实际上已经有这样的功能,称为分区。 note this proves the need for this kind of function because otherwise it wouldn't be defined. 请注意,这证明需要使用这种功能,因为否则将无法对其进行定义。 this shrinks the code to: 这将代码缩小为:

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) =
    let (lesser, greater) = partition (p>) xs
    in (quicksort lesser) ++ [p] ++ (quicksort greater)

Python is really optimized for this sort of thing. Python确实针对此类事情进行了优化。 I suspect that Haskell isn't. 我怀疑Haskell不是。 Here's a similar question that provides some very good answers. 这是一个类似的问题 ,提供了很好的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM