[英]Read numbers from stdin into a Data.Vector.Unboxed.Vector Int64
Given is a text file (for piping) with many numbers divided by a space, like so: 给定是一个文本文件(用于管道),其中许多数字除以空格,如下所示:
234 456 345 ...
What is the best way to read them all into a Data.Vector.Unboxed.Vector Int64
? 将它们全部读入
Data.Vector.Unboxed.Vector Int64
的最佳方法是什么? My current code looks like this: 我当前的代码如下所示:
import Control.Applicative
import Control.Arrow
import Data.Int
import Data.Maybe
import qualified Data.ByteString.Char8 as B
import qualified Data.Vector.Unboxed as V
main :: IO ()
main = do
v <- readInts <$> B.getContents
print $ V.maximum v
-- splitted for profiling
readInts :: B.ByteString -> V.Vector Int64
readInts = a >>> b >>> c >>> d
a = B.split ' '
b = mapMaybe (B.readInt >>> liftA fst)
c = map fromIntegral
d = V.fromList
Here is the profiler output 这是分析器输出
Thu Sep 18 16:08 2014 Time and Allocation Profiling Report (Final)
FastReadInts +RTS -p -K800M -RTS
total time = 0.51 secs (505 ticks @ 1000 us, 1 processor)
total alloc = 1,295,988,256 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
d Main 74.3 5.2
b Main 9.9 35.6
a Main 6.3 40.0
main Main 4.8 0.0
c Main 3.2 19.3
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 60 0 0.4 0.0 100.0 100.0
main Main 121 0 4.8 0.0 98.2 100.0
readInts Main 123 0 0.0 0.0 93.5 100.0
a Main 131 0 6.1 40.0 6.1 40.0
b Main 129 0 9.9 35.6 9.9 35.6
c Main 127 0 3.2 19.3 3.2 19.3
d Main 125 0 74.3 5.2 74.3 5.2
CAF Main 119 0 0.0 0.0 0.2 0.0
a Main 130 1 0.2 0.0 0.2 0.0
b Main 128 1 0.0 0.0 0.0 0.0
c Main 126 1 0.0 0.0 0.0 0.0
d Main 124 1 0.0 0.0 0.0 0.0
readInts Main 122 1 0.0 0.0 0.0 0.0
main Main 120 1 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 103 0 0.6 0.0 0.6 0.0
CAF GHC.IO.Encoding 96 0 0.2 0.0 0.2 0.0
CAF GHC.IO.Handle.Internals 93 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 83 0 0.2 0.0 0.2 0.0
CAF GHC.IO.Encoding.Iconv 81 0 0.2 0.0 0.2 0.0
The programm is compiled and run this way: 程序编译并以这种方式运行:
ghc -O2 -prof -auto-all -rtsopts FastReadInts.hs
./FastReadInts +RTS -p -K800M < many_numbers.txt
many_numbers.txt is about 14MB large. many_numbers.txt大约14MB。
How can this bottleneck, ie V.fromList
, be removed? 如何去除这个瓶颈,即
V.fromList
?
It is hard to answer questions like this without some expected level of performance or point of comparison. 没有一些预期的绩效水平或比较点,很难回答这样的问题。 By simply omitting the profiling your code runs in 100ms over an ASCii file of 21MB of random 64-bit numbers, this seems reasonable to me.
通过简单地省略你的代码在一个21MB的随机64位数字的ASCii文件上运行100ms的分析,这对我来说似乎是合理的。
$ time ./so < randoms.txt
9223350746261547498
real 0m0.109s
user 0m0.094s
sys 0m0.013s
And the generation of the test data: 并生成测试数据:
import System.Random
main = do
g <- newStdGen
let rs = take (2^20) $ randomRs (0,2^64) g :: [Integer]
writeFile "randoms.txt" $ unwords (map show rs)
EDIT: 编辑:
As requested: 按照要求:
import Data.Vector.Unboxed.Mutable as M
...
listToVector :: [Int64] -> V.Vector Int64
listToVector ls = unsafePerformIO $ do
m <- M.unsafeNew (2^20)
zipWithM_ (M.unsafeWrite m) [0..(2^20)-1] ls
V.unsafeFreeze m
Just wanted to note that pre-allocating mutable vector does not impact performance too much. 只是想要注意,预分配可变向量不会对性能产生太大影响。 In most cases run time will be dominated by reading file.
在大多数情况下,运行时间将由读取文件占主导地位。
I have benchmarked both versions on 2^23
numbers and it seems that pre-allocated mutable array is even a bit slower. 我已经对
2^23
数字的两个版本进行了基准测试,看起来预先分配的可变数组甚至有点慢。
benchmarking V.fromList
time 49.51 ms (47.65 ms .. 51.07 ms)
0.998 R² (0.995 R² .. 1.000 R²)
mean 48.24 ms (47.82 ms .. 49.01 ms)
std dev 971.5 μs (329.1 μs .. 1.438 ms)
benchmarking listToVector
time 109.9 ms (106.2 ms .. 119.9 ms)
0.993 R² (0.975 R² .. 1.000 R²)
mean 109.3 ms (107.6 ms .. 113.8 ms)
std dev 4.041 ms (1.149 ms .. 6.129 ms)
And here is the code of the benchmark: 以下是基准测试的代码:
import Control.Applicative
import Control.Monad (zipWithM_)
import System.IO.Unsafe
import Data.Int
import qualified Data.ByteString.Char8 as B
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Unboxed.Mutable as M
import Criterion.Main
main :: IO ()
main = do
let readInt x = let Just (i,_) = B.readInt x in fromIntegral i
nums <- map readInt . B.words <$> B.readFile "randoms.txt"
defaultMain
[bench "V.fromList" $ whnf (V.maximum . V.fromList) nums
,bench "listToVector" $ whnf (V.maximum . listToVector) nums
]
listToVector :: [Int64] -> V.Vector Int64
listToVector ls = unsafePerformIO $ do
m <- M.unsafeNew (2^23)
zipWithM_ (M.unsafeWrite m) [0..(2^23)-1] ls
V.unsafeFreeze m
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.