简体   繁体   English

Haskell基准测试/非严格减少的nf / whnf的优化

[英]Haskell benchmarking/Optimization of nf/whnf of non-strict reduction

I am trying to optimize a library which is designed to take a large data set and then apply different operations to it. 我正在尝试优化一个旨在获取大型数据集然后对其应用不同操作的库。 Now that the library is working, I want to optimize it. 既然库正在运行,我想优化它。

I am under the impression that non-strict evaluation allows GHC to combine operations so that the data is only iterated over once when all of the functions are written so that arguments are ordered to facilitate whnf reduction. 我的印象是非严格评估允许GHC组合操作,以便在编写所有函数时只迭代一次数据,以便对参数进行排序以便于减少。 (And to potentially reduce the number of operations performed on each datum) (并且可能减少对每个数据执行的操作数)

To test this I wrote the following code: 为了测试这个,我编写了以下代码:

import Criterion.Main

main = defaultMain
       [ bench "warmup (whnf)" $ whnf putStrLn "HelloWorld",
         bench "single (whnf)" $ whnf single [1..10000000],
         bench "single (nf)"   $ nf   single [1..10000000],
         bench "double (whnf)" $ whnf double [1..10000000],
         bench "double (nf)"   $ nf   double [1..10000000]]

single :: [Int] -> [Int]
single lst = fmap (* 2) lst

double :: [Int] -> [Int]             
double lst =  fmap (* 3) $ fmap (* 2) lst

Benchmarking using the Criterion library I get the following results: 使用Criterion库进行基准测试我得到以下结果:

benchmarking warmup (whnf)
mean: 13.72408 ns, lb 13.63687 ns, ub 13.81438 ns, ci 0.950
std dev: 455.7039 ps, lb 409.6489 ps, ub 510.8538 ps, ci 0.950

benchmarking single (whnf)
mean: 15.88809 ns, lb 15.79157 ns, ub 15.99774 ns, ci 0.950
std dev: 527.8374 ps, lb 458.6027 ps, ub 644.3497 ps, ci 0.950

benchmarking single (nf)
collecting 100 samples, 1 iterations each, in estimated 107.0255 s
mean: 195.4457 ms, lb 195.0313 ms, ub 195.9297 ms, ci 0.950
std dev: 2.299726 ms, lb 2.006414 ms, ub 2.681129 ms, ci 0.950

benchmarking double (whnf)
mean: 15.24267 ns, lb 15.17950 ns, ub 15.33299 ns, ci 0.950
std dev: 384.3045 ps, lb 288.1722 ps, ub 507.9676 ps, ci 0.950

benchmarking double (nf)
collecting 100 samples, 1 iterations each, in estimated 20.56069 s
mean: 205.3217 ms, lb 204.9625 ms, ub 205.8897 ms, ci 0.950
std dev: 2.256761 ms, lb 1.590083 ms, ub 3.324734 ms, ci 0.950

Does GHC optimize the "double" function so that the list is only operated on once by (* 6)? GHC是否优化了“双重”功能,以便列表仅在(* 6)上运行一次? The nf results show that this is the case because otherwise the mean computation time for "double" would be twice that of "single" nf结果表明情况就是这样,否则“double”的平均计算时间将是“single”的两倍

What is the difference that makes the whnf version run so fast? 使whnf版本运行得如此之快的区别是什么? I can only assume that nothing is actually being performed (OR just the first iteration in the reduction) 我只能假设实际上没有执行任何操作(或者只是减少中的第一次迭代)

Am I even using the correct terminology? 我甚至使用了正确的术语吗?

Looking at the core (intermediate code) generated by GHC using the -ddump-simpl option, we can confirm that GHC does indeed fuse the two applications of map into one (using -O2 ). 查看GHC使用-ddump-simpl选项生成的核心(中间代码),我们可以确认GHC确实将map的两个应用程序融合为一个(使用-O2 )。 The relevant parts of the dump are: 转储的相关部分是:

Main.main10 :: GHC.Types.Int -> GHC.Types.Int
GblId
[Arity 1
 NoCafRefs]
Main.main10 =
  \ (x_a1Ru :: GHC.Types.Int) ->
    case x_a1Ru of _ { GHC.Types.I# x1_a1vc ->
    GHC.Types.I# (GHC.Prim.*# (GHC.Prim.+# x1_a1vc 2) 3)
    }

Main.double :: [GHC.Types.Int] -> [GHC.Types.Int]
GblId
[Arity 1
 NoCafRefs
 Str: DmdType S]
Main.double =
  \ (lst_a1gF :: [GHC.Types.Int]) ->
    GHC.Base.map @ GHC.Types.Int @ GHC.Types.Int Main.main10 lst_a1gF

Note how there is only one use of GHC.Base.map in Main.double , referring to the combined function Main.main10 which both adds 2 and multiplies by 3. This is likely a result of GHC first inlining the Functor instance for lists so that fmap becomes map , and then applying a rewrite rule that allows two applications of map to be fused, plus some more inlining and other optimizations. 注意如何只有一个使用的GHC.Base.mapMain.double ,指的是组合功能Main.main10这都是由3这增加了2和乘法可能是GHC的结果第一内联Functor的列表,以便例如fmap成为map ,然后应用重写规则 ,允许融合两个map应用程序,以及一些更多的内联和其他优化。

WHNF means that the expression is only evaluated to the "outermost" data constructor or lambda. WHNF意味着表达式仅被评估为“最外层”数据构造函数或lambda。 In this case, that means the first (:) constructor. 在这种情况下,这意味着第一个(:)构造函数。 That's why it's so much faster, since almost no work is being done. 这就是为什么它快得多,因为几乎没有任何工作要做。 See my answer to What is Weak Head Normal Form? 请参阅我的回答什么是弱头正常形式? for more details. 更多细节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM