As part of a programming challenge, I need to read, from stdin, a sequence of space-separated integers ( on a single line ), and print the sum of those integers to stdout. The sequence in question can contain as many as 10,000,000 integers.
I have two solutions for this: one written in Haskell ( foo.hs
), and another, equivalent one, written in Python 2 ( foo.py
). Unfortunately, the (compiled) Haskell program is consistently slower than the Python program, and I'm at a loss for explaining the discrepancy in performance between the two programs; see the Benchmark section below. If anything, I would have expected Haskell to have the upper hand...
What am I doing wrong? How can I account for this discrepancy? Is there an easy way of speeding up my Haskell code?
(For information, I'm using a mid-2010 Macbook Pro with 8Gb RAM, GHC 7.8.4, and Python 2.7.9.)
foo.hs
main = print . sum =<< getIntList
getIntList :: IO [Int]
getIntList = fmap (map read . words) getLine
(compiled with ghc -O2 foo.hs
)
foo.py
ns = map(int, raw_input().split())
print sum(ns)
In the following, test.txt
consists of a single line of 10 million space-separated integers.
# Haskell
$ time ./foo < test.txt
1679257
real 0m36.704s
user 0m35.932s
sys 0m0.632s
# Python
$ time python foo.py < test.txt
1679257
real 0m7.916s
user 0m7.756s
sys 0m0.151s
read
is slow. For bulk parsing, use bytestring
or text
primitives, or attoparsec
.
I did some benchmarking. Your original version ran in 23,9 secs on my computer. The version below ran in 0.35 secs:
import qualified Data.ByteString.Char8 as B
import Control.Applicative
import Data.Maybe
import Data.List
import Data.Char
main = print . sum =<< getIntList
getIntList :: IO [Int]
getIntList =
map (fst . fromJust . B.readInt) . B.words <$> B.readFile "test.txt"
By specializing the parser to your test.txt
file, I could get the runtime down to 0.26 sec:
getIntList :: IO [Int]
getIntList =
unfoldr (B.readInt . B.dropWhile (==' ')) <$> B.readFile "test.txt"
Read is slow
Fast read, from this answer , will bring you down to 5.5 seconds.
import Numeric
fastRead :: String -> Int
fastRead s = case readDec s of [(n, "")] -> n
Strings are Linked Lists
In Haskell the String
type is a linked list. Using a packed representation ( bytestring
if you really only want ascii but Text
is also very fast and supports unicode). As shown in this answer , the performance should then be neck and neck.
I would venture to guess that a big part of your problem is actually words
. When you map read . words
map read . words
, what you're actually doing is this:
This is a fairly ridiculous way to proceed. I believe you can even do better using something horrible like reads
, but it would make more sense to use something like ReadP . You can also try fancier sorts of things like stream-based parsing; I don't know if that will help much or not.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.