简体   繁体   English

为什么只对 function 设计进行微小的更改就从根本上改变了标准基准的结果?

[英]Why only minor change to function design radically changes result of criterion benchmark?

I have two source files which are doing roughly the same.我有两个大致相同的源文件。 The only difference is that in the first case function is passed as a parameter and in the second one - value.唯一的区别是,在第一种情况下 function 作为参数传递,而在第二种情况下 - 值。

First case:第一种情况:

module Main where

import Data.Vector.Unboxed as UB
import qualified Data.Vector as V

import Criterion.Main

regularVectorGenerator :: (Int -> t) -> V.Vector t
regularVectorGenerator = V.generate 99999

unboxedVectorGenerator :: Unbox t => (Int -> t) -> UB.Vector t
unboxedVectorGenerator = UB.generate 99999

main :: IO ()
main = defaultMain
    [
        bench "boxed"   $ whnf regularVectorGenerator (+2137)
      , bench "unboxed" $ whnf unboxedVectorGenerator (+2137)
    ]

Second case:第二种情况:

module Main where

import Data.Vector.Unboxed as UB
import qualified Data.Vector as V

import Criterion.Main

regularVectorGenerator :: Int -> V.Vector Int
regularVectorGenerator = flip V.generate (+2137)

unboxedVectorGenerator :: Int -> UB.Vector Int
unboxedVectorGenerator = flip UB.generate (+2137)

main :: IO ()
main = defaultMain
    [
        bench "boxed"   $ whnf regularVectorGenerator 99999
      , bench "unboxed" $ whnf unboxedVectorGenerator 99999
    ]

What I noticed that during benchamrking size of vector the unboxed is, as expected, always smaller yet size of both vectors vary drasticlly.我注意到,在对向量进行基准测试期间,未装箱的正如预期的那样总是更小,但两个向量的大小变化很大。 Here is output of这里是 output

first case:第一种情况:

 benchmarking boxed
 time                 7.626 ms   (7.515 ms .. 7.738 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
 mean                 7.532 ms   (7.472 ms .. 7.583 ms)
 std dev              164.3 μs   (133.8 μs .. 201.3 μs)
 allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
   iters              **1.680e7**    (1.680e7 .. 1.680e7)
   y                  2357.390   (1556.690 .. 3422.724)

 benchmarking unboxed
 time                 889.1 μs   (878.9 μs .. 901.8 μs)
                     0.998 R²   (0.995 R² .. 0.999 R²)
 mean                 868.6 μs   (858.6 μs .. 882.6 μs)
 std dev              39.05 μs   (28.30 μs .. 57.02 μs)
 allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
   iters              **4000009.003** (4000003.843 .. 4000014.143)
   y                  2507.089   (2025.196 .. 3035.962)
 variance introduced by outliers: 36% (moderately inflated)

and the second case:第二种情况:

 benchmarking boxed
 time                 1.366 ms   (1.357 ms .. 1.379 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
 mean                 1.350 ms   (1.343 ms .. 1.361 ms)
 std dev              29.96 μs   (21.74 μs .. 43.56 μs)
 allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
   iters              **2400818.350** (2400810.284 .. 2400826.685)
  y                  2423.216   (1910.901 .. 3008.024)
 variance introduced by outliers: 12% (moderately inflated)

 benchmarking unboxed
 time                 61.30 μs   (61.24 μs .. 61.37 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
 mean                 61.29 μs   (61.25 μs .. 61.33 μs)
 std dev              122.1 ns   (91.64 ns .. 173.9 ns)
 allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
   iters              **800040.029** (800039.745 .. 800040.354)
   y                  2553.830   (2264.684 .. 2865.637)

Benchameked size of vector decreased by order of magnitude just by de-parametrizing function.仅通过对 function 进行反参数化,向量的基准大小就按数量级减小。 Can someone explain me why?有人可以解释我为什么吗?

I compiled both exaples with those flags:我用这些标志编译了两个例子:

-O2 -rtsopts -O2 -rtsopts

and launched with并推出

--regress allocated:iters +RTS -T --regress 分配:iters +RTS -T

The difference is that if the generating function is already known in the benchmarked function, the generator is inlined and the involved Int -s are unboxed as well.不同之处在于,如果生成 function 在基准测试 function 中已知,则生成器是内联的,并且所涉及的Int -s 也将被拆箱。 If the generating function is the benchmark parameter , it cannot be inlined.如果生成 function 是基准参数,则不能内联。

From the benchmarking perspective the second version is the correct one, since in normal usage we want the generating function to be inlined.从基准测试的角度来看,第二个版本是正确的,因为在正常使用中,我们希望生成的 function 被内联。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM