[英]Haskell: map length . group is way slower than explicit recursion?
Consider this trivial algorithm of prime-decomposition of an integer n
: Let d'
be the divisor of n
last found.考虑这个整数
n
素数分解的简单算法:让d'
是最后找到的n
的除数。 Initially, set d'=1
.最初,设置
d'=1
。 Find the smallest divisor d>d'
of n
, and find the maximal value e
such that d e
divides n
.寻找最小除数
d>d'
的n
,并找到最大值e
使得d e
分歧n
。 Append d e
to the answer and repeat the procedure on n/d e
.将
d e
附加到答案并在n/d e
上重复该过程。 Finally, stop when n
becomes 1. For simplicity, let's ignore mathematical optimizations, like stop at sqrt n
etc.最后,当
n
变为 1 时停止。为了简单起见,让我们忽略数学优化,例如在sqrt n
处停止等。
I have implemented it in two ways.我已经通过两种方式实现了它。 The first one generates a list of division "attempts", and then groups the successful ones by divisors.
第一个生成除法“尝试”列表,然后按除数对成功的进行分组。 For example, for
n=20
, we first generate [(2,20),(2,10),(2,5),(3,5),(4,5),(5,5),(5,1)]
, which we then transform to the desired [(2,2),(5,1)]
using group
and other library functions.例如,对于
n=20
,我们首先生成[(2,20),(2,10),(2,5),(3,5),(4,5),(5,5),(5,1)]
,然后我们使用group
和其他库函数将其转换为所需的[(2,2),(5,1)]
。
The second implementation is an explicit recursion which keeps track of the exponent e
along the way, appends d e
to the answer once the maximal e
is reached, proceeds to finding the "next" d
, and so on.第二种实现是显式递归,它一路跟踪指数
e
,一旦达到最大值e
就将d e
附加到答案,继续寻找“下一个” d
,依此类推。
Question 1: Why does the first implementation run way slower than the second, despite the following:问题 1:尽管有以下情况,为什么第一个实现的运行速度比第二个慢:
div
, the core step of the algorithm, roughly the same number of times.div
的次数大致相同。divTrials n
, the list I am talking about, is transformed by a chain of higher order functions.divTrials n
由一系列高阶函数转换。 In that, I think that the part map (\\xs-> (head xs,length xs)) ... group
should tell the compiler that the list is just intermediate:map (\\xs-> (head xs,length xs)) ... group
应该告诉编译器列表只是中间的:{-# OPTIONS_GHC -O2 #-}
module GroupCheck where
import Data.List
import Data.Maybe
implement1 :: Integral t=> t -> [(t,Int)] -- IMPLEMENTATION 1
implement1 = map (\xs-> (head xs,length xs)).factorGroups where
tryDiv (d,n)
| n `mod` d == 0 = (d,n `div` d)
| n == 1 = (1,1) -- hack
| otherwise = (d+1,n)
divTrials n = takeWhile (/=(1,1)) $ (2,n): map tryDiv (divTrials n)
factorGroups = filter (not.null).map tail.group.map fst.divTrials
implement2 :: Show t => Integral t => t -> [(t,Int)] -- IMPLEMENTATION 2
implement2 num = keep2 $ tail $ go (1,0,1,num) where
range d n = [d+1..n]
nextd d n = fromMaybe n $ find ((0==).(n`mod`)) (range d n)
update (d,e,de,n)
| n `mod` d == 0 = update (d,e+1,de*d,n`div`d)
| otherwise = (d,e,de,n)
go (d,e,de,1) = [(d,e,de,1)]
go (d,e,de,n) = (d,e,de,n) : go (update (nextd d n,0,1,n))
keep2 = map (\(d,e,_,_)->(d,e))
main :: IO ()
main = do
let n = 293872
let ans1 = implement1 n
let ans2 = implement2 n
print ans1
print ans2
Profiling tells us that tryDiv
and divTrials
together eat up >99% of the entire execution time:分析告诉我们
tryDiv
和divTrials
一起占用了整个执行时间的 99% 以上:
> stack ghc -- -main-is GroupCheck.main -prof -fprof-auto -rtsopts GroupCheck
> ./GroupCheck +RTS -p >/dev/null && cat GroupCheck.prof
GroupCheck +RTS -p -RTS
total time = 18.34 secs (18338 ticks @ 1000 us, 1 processor)
total alloc = 17,561,404,568 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
implement1.divTrials GroupCheck GroupCheck.hs:12:3-69 52.6 69.2
implement1.tryDiv GroupCheck GroupCheck.hs:(8,3)-(11,25) 47.2 30.8
Question 1.5: So.. what's so bad about these functions?问题 1.5:那么……这些函数有什么不好? Also,
还,
Question 2: In a more general case of having to aggregate contiguous blocks of identical elements from a nondecreasing sequence, should we go the bulky implement2
way if we want speed?问题 2:在更一般的情况下,必须从非递减序列中聚合相同元素的连续块,如果我们想要速度,我们应该采用笨重的
implement2
方式吗? (Again, ignoring domain-specific optimizations.) (同样,忽略特定领域的优化。)
Or did I totally miss something obvious?还是我完全错过了一些明显的东西? Thanks!
谢谢!
Just to establish a baseline, I ran your program on a slightly larger starting number (so that time
didn't print out 0.00s).只是为了建立一个基线,我在一个稍大的起始数字上运行你的程序(这样
time
就不会打印出 0.00 秒)。 I chose n = 2938722345623
for no particular reason.我没有特别的原因选择了
n = 2938722345623
。 Here's the timings before starting to tweak things:这是开始调整之前的时间安排:
ans1
: indistinguishable from infinity (I finished writing this entire answer and it was still running, about 26 minutes in total) ans1
: 与无穷大ans1
区别(我写完了整个答案,它仍在运行,总共大约 26 分钟)
ans2
: 2.78s ans2
:2.78s
The first thing to try is to tweak this line:首先要尝试的是调整这一行:
divTrials n = takeWhile (/=(1,1)) $ (2,n): map tryDiv (divTrials n)
This looks like a pretty natural definition, but it turns out that GHC never memoizes function calls.这看起来是一个很自然的定义,但事实证明 GHC 从不记忆函数调用。 So if you want to make a list that's defined recursively in terms of itself, you must not make a function call in the recursion.
因此,如果您想创建一个根据自身递归定义的列表,则不得在递归中进行函数调用。 Here's how:
就是这样:
divTrials n = xs where xs = takeWhile (/=(1,1)) $ (2,n): map tryDiv xs
Just that change brings the time down to 7.85s.正是这种变化将时间缩短到 7.85 秒。 Still off by a factor of about 3, but much much better.
仍然降低了大约 3 倍,但要好得多。
The less obvious problem lies here:不太明显的问题就在这里:
factorGroups = filter (not.null).map tail.group.map fst.divTrials
Putting the group
so early breaks fusion, causing that intermediate list to actually be materialized.这么早放
group
会破坏融合,导致中间名单真正实现。 This means allocating and deallocating a lot of cons cells and tuples.这意味着分配和释放大量的 cons 单元和元组。 Here's an implementation that has the same spirit, but puts more work before the
group
:这是一个具有相同精神的实现,但在
group
面前投入了更多工作:
tryDiv d n
| n `mod` d == 0 = d : tryDiv d (n `div` d)
| n == 1 = []
| otherwise = tryDiv (d+1) n
factorGroups = group . tryDiv 2
With that, we are down to 2.65s -- slightly faster than ans2
, though I only did one test of each so it's pretty likely to just be measurement noise.有了这个,我们下降到 2.65 秒——比
ans2
略快,尽管我只对每个测试做了一个测试,所以很可能只是测量噪声。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.