简体   繁体   English

推理懒惰

[英]Reasoning laziness

I have the following snippet: 我有以下代码段:

import qualified Data.Vector as V
import qualified Data.ByteString.Lazy as BL
import System.Environment
import Data.Word
import qualified Data.List.Stream as S

histogram ::  [Word8] -> V.Vector Int
histogram c = V.accum (+) (V.replicate 256 0) $ S.zip (map fromIntegral c) (S.repeat 1)

mkHistogram file = do
  hist <- (histogram . BL.unpack) `fmap` BL.readFile file
  print hist

I see it like this: Nothing is done until printing. 我这样看:在打印之前什么也没做。 When printing the thunks are unwinded by first unpacking, then mapping fromIntegral one Word8 at a time. 通过首先解压缩打印thunk时,然后一次从一个Word8映射。 Each of these word8's are zipped with 1, again one value at a time. 这些word8中的每一个都用1压缩,一次一个值。 This tuples are then taken by the accumulator function which updates the array, one tuple/Word8 at a time. 这个元组然后由累加器函数获取,该函数一次更新数组,一个元组/ Word8。 Then we move to the next thunk and repeat until no more content left. 然后我们移动到下一个thunk并重复,直到不再有内容为止。

This would allow for creating histograms in constant memory, but alas this is not happening, but instead it crashes with stack overflow. 这将允许在常量内存中创建直方图,但是这不会发生,而是它会因堆栈溢出而崩溃。 If I try to profile it, I see it running to the end, but taking memory a lot (300-500 Mb for a 2.5 Mb file). 如果我尝试对其进行分析,我会看到它运行到最后,但是记忆很多(对于2.5 Mb文件,需要300-500 Mb)。 Memory is obtained linearly until the end until it can be released, forming a "nice" triangular graph. 直接获得记忆直到它可以被释放,形成“漂亮”的三角形图。

Where did my reasoning go wrong and what steps should I take to make this run in constant memory? 我的推理在哪里出错了,我应该采取什么步骤让它在恒定的记忆中运行?

I believe the problem is that Data.Vector is not strict in its elements. 我认为问题是Data.Vector的元素并不严格。 So although your reasoning is right, when accumulating the histogram your thunks looks like: 因此,虽然你的推理是正确的,但在累积直方图时你的thunk看起来像:

<1+(1+(1+0)) (1+(1+0)) 0 0 (1+(1+(1+(1+0)))) ... >

Rather than 而不是

<3 2 0 0 4 ...>

And only when you print are those sums computed. 只有当你打印时才计算出这些总和。 I don't see a strict accum function in the docs (shame), and there isn't any place to hook in a seq . 我没有在docs中看到严格的accum函数(羞耻),并且没有任何地方可以挂钩seq One way out of this predicament may be to use Data.Vector.Unboxed instead, since unboxed types are unlifted aka strict. 摆脱这种困境的一种方法可能是使用Data.Vector.Unboxed ,因为未装箱的类型是Data.Vector.Unboxed严格的。 Maybe you could request a strict accum function with your example as a use case. 也许您可以使用您的示例作为用例请求严格的accum功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM