[英]Why only minor change to function design radically changes result of criterion benchmark?
I have two source files which are doing roughly the same.我有两个大致相同的源文件。 The only difference is that in the first case function is passed as a parameter and in the second one - value.
唯一的区别是,在第一种情况下 function 作为参数传递,而在第二种情况下 - 值。
First case:第一种情况:
module Main where
import Data.Vector.Unboxed as UB
import qualified Data.Vector as V
import Criterion.Main
regularVectorGenerator :: (Int -> t) -> V.Vector t
regularVectorGenerator = V.generate 99999
unboxedVectorGenerator :: Unbox t => (Int -> t) -> UB.Vector t
unboxedVectorGenerator = UB.generate 99999
main :: IO ()
main = defaultMain
[
bench "boxed" $ whnf regularVectorGenerator (+2137)
, bench "unboxed" $ whnf unboxedVectorGenerator (+2137)
]
Second case:第二种情况:
module Main where
import Data.Vector.Unboxed as UB
import qualified Data.Vector as V
import Criterion.Main
regularVectorGenerator :: Int -> V.Vector Int
regularVectorGenerator = flip V.generate (+2137)
unboxedVectorGenerator :: Int -> UB.Vector Int
unboxedVectorGenerator = flip UB.generate (+2137)
main :: IO ()
main = defaultMain
[
bench "boxed" $ whnf regularVectorGenerator 99999
, bench "unboxed" $ whnf unboxedVectorGenerator 99999
]
What I noticed that during benchamrking size of vector the unboxed is, as expected, always smaller yet size of both vectors vary drasticlly.我注意到,在对向量进行基准测试期间,未装箱的正如预期的那样总是更小,但两个向量的大小变化很大。 Here is output of
这里是 output
first case:第一种情况:
benchmarking boxed
time 7.626 ms (7.515 ms .. 7.738 ms)
0.999 R² (0.998 R² .. 0.999 R²)
mean 7.532 ms (7.472 ms .. 7.583 ms)
std dev 164.3 μs (133.8 μs .. 201.3 μs)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters **1.680e7** (1.680e7 .. 1.680e7)
y 2357.390 (1556.690 .. 3422.724)
benchmarking unboxed
time 889.1 μs (878.9 μs .. 901.8 μs)
0.998 R² (0.995 R² .. 0.999 R²)
mean 868.6 μs (858.6 μs .. 882.6 μs)
std dev 39.05 μs (28.30 μs .. 57.02 μs)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters **4000009.003** (4000003.843 .. 4000014.143)
y 2507.089 (2025.196 .. 3035.962)
variance introduced by outliers: 36% (moderately inflated)
and the second case:第二种情况:
benchmarking boxed
time 1.366 ms (1.357 ms .. 1.379 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 1.350 ms (1.343 ms .. 1.361 ms)
std dev 29.96 μs (21.74 μs .. 43.56 μs)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters **2400818.350** (2400810.284 .. 2400826.685)
y 2423.216 (1910.901 .. 3008.024)
variance introduced by outliers: 12% (moderately inflated)
benchmarking unboxed
time 61.30 μs (61.24 μs .. 61.37 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 61.29 μs (61.25 μs .. 61.33 μs)
std dev 122.1 ns (91.64 ns .. 173.9 ns)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters **800040.029** (800039.745 .. 800040.354)
y 2553.830 (2264.684 .. 2865.637)
Benchameked size of vector decreased by order of magnitude just by de-parametrizing function.仅通过对 function 进行反参数化,向量的基准大小就按数量级减小。 Can someone explain me why?
有人可以解释我为什么吗?
I compiled both exaples with those flags:我用这些标志编译了两个例子:
-O2 -rtsopts
-O2 -rtsopts
and launched with并推出
--regress allocated:iters +RTS -T
--regress 分配:iters +RTS -T
The difference is that if the generating function is already known in the benchmarked function, the generator is inlined and the involved Int
-s are unboxed as well.不同之处在于,如果生成 function 在基准测试 function 中已知,则生成器是内联的,并且所涉及的
Int
-s 也将被拆箱。 If the generating function is the benchmark parameter , it cannot be inlined.如果生成 function 是基准参数,则不能内联。
From the benchmarking perspective the second version is the correct one, since in normal usage we want the generating function to be inlined.从基准测试的角度来看,第二个版本是正确的,因为在正常使用中,我们希望生成的 function 被内联。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.