Haskell：地图长度。 group 比显式递归慢吗？

Question

Consider this trivial algorithm of prime-decomposition of an integer n : Let d' be the divisor of n last found.考虑这个整数n素数分解的简单算法：让d'是最后找到的n的除数。 Initially, set d'=1 .最初，设置d'=1 。 Find the smallest divisor d>d' of n , and find the maximal value e such that d ^e divides n .寻找最小除数d>d'的n ，并找到最大值e使得^{d e}分歧n 。 Append d ^e to the answer and repeat the procedure on n/d ^e .将d ^e附加到答案并在n/d ^e上重复该过程。 Finally, stop when n becomes 1. For simplicity, let's ignore mathematical optimizations, like stop at sqrt n etc.最后，当n变为 1 时停止。为了简单起见，让我们忽略数学优化，例如在sqrt n处停止等。

I have implemented it in two ways.我已经通过两种方式实现了它。 The first one generates a list of division "attempts", and then groups the successful ones by divisors.第一个生成除法“尝试”列表，然后按除数对成功的进行分组。 For example, for n=20 , we first generate [(2,20),(2,10),(2,5),(3,5),(4,5),(5,5),(5,1)] , which we then transform to the desired [(2,2),(5,1)] using group and other library functions.例如，对于n=20 ，我们首先生成[(2,20),(2,10),(2,5),(3,5),(4,5),(5,5),(5,1)] ，然后我们使用group和其他库函数将其转换为所需的[(2,2),(5,1)] 。

The second implementation is an explicit recursion which keeps track of the exponent e along the way, appends d ^e to the answer once the maximal e is reached, proceeds to finding the "next" d , and so on.第二种实现是显式递归，它一路跟踪指数e ，一旦达到最大值e就将d ^e附加到答案，继续寻找“下一个” d ，依此类推。

Question 1: Why does the first implementation run way slower than the second, despite the following:问题 1：尽管有以下情况，为什么第一个实现的运行速度比第二个慢：

Both the implementations execute div , the core step of the algorithm, roughly the same number of times.两种实现都执行算法的核心步骤div的次数大致相同。
Lazy evaluation (and fusion?) has the effect that the long list illustrated above never has to be materialized in the first place.懒惰评估（和融合？）的效果是上面说明的长列表永远不必首先实现。 As you can see in the code below, divTrials n , the list I am talking about, is transformed by a chain of higher order functions.正如您在下面的代码中看到的那样，我正在谈论的列表divTrials n由一系列高阶函数转换。 In that, I think that the part map (\\xs-> (head xs,length xs)) ... group should tell the compiler that the list is just intermediate:在那，我认为部分map (\\xs-> (head xs,length xs)) ... group应该告诉编译器列表只是中间的：

{-# OPTIONS_GHC -O2 #-}
module GroupCheck where
import Data.List
import Data.Maybe

implement1 :: Integral t=> t -> [(t,Int)]                    -- IMPLEMENTATION 1
implement1  = map (\xs-> (head xs,length xs)).factorGroups where
  tryDiv (d,n)
    | n `mod` d == 0 = (d,n `div` d)
    | n == 1 = (1,1) -- hack
    | otherwise = (d+1,n)
  divTrials n = takeWhile (/=(1,1)) $ (2,n): map tryDiv (divTrials n)
  factorGroups = filter (not.null).map tail.group.map fst.divTrials

implement2 :: Show t => Integral t => t -> [(t,Int)]         -- IMPLEMENTATION 2
implement2 num = keep2 $ tail $ go (1,0,1,num) where
  range d n = [d+1..n]
  nextd d n = fromMaybe n $ find ((0==).(n`mod`)) (range d n)
  update (d,e,de,n)
    | n `mod` d == 0 = update (d,e+1,de*d,n`div`d)
    | otherwise      = (d,e,de,n)
  go (d,e,de,1) = [(d,e,de,1)]
  go (d,e,de,n) = (d,e,de,n) : go (update (nextd d n,0,1,n))
  keep2 = map (\(d,e,_,_)->(d,e))

main :: IO ()
main = do
  let n = 293872
  let ans1 = implement1 n 
  let ans2 = implement2 n
  print ans1
  print ans2

Profiling tells us that tryDiv and divTrials together eat up >99% of the entire execution time:分析告诉我们tryDiv和divTrials一起占用了整个执行时间的 99% 以上：

> stack ghc -- -main-is GroupCheck.main -prof -fprof-auto -rtsopts GroupCheck 
> ./GroupCheck +RTS -p >/dev/null && cat GroupCheck.prof


           GroupCheck +RTS -p -RTS

        total time  =       18.34 secs   (18338 ticks @ 1000 us, 1 processor)
        total alloc = 17,561,404,568 bytes  (excludes profiling overheads)

COST CENTRE          MODULE     SRC                          %time %alloc

implement1.divTrials GroupCheck GroupCheck.hs:12:3-69         52.6   69.2
implement1.tryDiv    GroupCheck GroupCheck.hs:(8,3)-(11,25)   47.2   30.8

Question 1.5: So.. what's so bad about these functions?问题 1.5：那么……这些函数有什么不好？ Also,还，

Question 2: In a more general case of having to aggregate contiguous blocks of identical elements from a nondecreasing sequence, should we go the bulky implement2 way if we want speed?问题 2：在更一般的情况下，必须从非递减序列中聚合相同元素的连续块，如果我们想要速度，我们应该采用笨重的implement2方式吗？ (Again, ignoring domain-specific optimizations.) （同样，忽略特定领域的优化。）

Or did I totally miss something obvious?还是我完全错过了一些明显的东西？ Thanks!谢谢！

Answer 1

Just to establish a baseline, I ran your program on a slightly larger starting number (so that time didn't print out 0.00s).只是为了建立一个基线，我在一个稍大的起始数字上运行你的程序（这样time就不会打印出 0.00 秒）。 I chose n = 2938722345623 for no particular reason.我没有特别的原因选择了n = 2938722345623 。 Here's the timings before starting to tweak things:这是开始调整之前的时间安排：

ans1 : indistinguishable from infinity (I finished writing this entire answer and it was still running, about 26 minutes in total) ans1 : 与无穷大ans1区别（我写完了整个答案，它仍在运行，总共大约 26 分钟）
ans2 : 2.78s ans2 ：2.78s

The first thing to try is to tweak this line:首先要尝试的是调整这一行：

divTrials n = takeWhile (/=(1,1)) $ (2,n): map tryDiv (divTrials n)

This looks like a pretty natural definition, but it turns out that GHC never memoizes function calls.这看起来是一个很自然的定义，但事实证明 GHC 从不记忆函数调用。 So if you want to make a list that's defined recursively in terms of itself, you must not make a function call in the recursion.因此，如果您想创建一个根据自身递归定义的列表，则不得在递归中进行函数调用。 Here's how:就是这样：

divTrials n = xs where xs = takeWhile (/=(1,1)) $ (2,n): map tryDiv xs

Just that change brings the time down to 7.85s.正是这种变化将时间缩短到 7.85 秒。 Still off by a factor of about 3, but much much better.仍然降低了大约 3 倍，但要好得多。

The less obvious problem lies here:不太明显的问题就在这里：

factorGroups = filter (not.null).map tail.group.map fst.divTrials

Putting the group so early breaks fusion, causing that intermediate list to actually be materialized.这么早放group会破坏融合，导致中间名单真正实现。 This means allocating and deallocating a lot of cons cells and tuples.这意味着分配和释放大量的 cons 单元和元组。 Here's an implementation that has the same spirit, but puts more work before the group :这是一个具有相同精神的实现，但在group面前投入了更多工作：

  tryDiv d n
    | n `mod` d == 0 = d : tryDiv d (n `div` d)
    | n == 1 = []
    | otherwise = tryDiv (d+1) n
  factorGroups = group . tryDiv 2

With that, we are down to 2.65s -- slightly faster than ans2 , though I only did one test of each so it's pretty likely to just be measurement noise.有了这个，我们下降到 2.65 秒——比ans2略快，尽管我只对每个测试做了一个测试，所以很可能只是测量噪声。

Haskell：地图长度。 group 比显式递归慢吗？

问题描述

1 个解决方案

解决方案1
1 2021-10-21 15:24:46

Haskell：地图长度。 group 比显式递归慢吗？

问题描述

1 个解决方案

解决方案1 1 2021-10-21 15:24:46

解决方案1
1 2021-10-21 15:24:46