如何在用GHC编译的Haskell函数中找到分配？

Question

I'm using GHC 7.4 to compile the following function: 我正在使用GHC 7.4编译以下功能：

nodups' :: [Int] -> Bool
nodups' = ok empty
  where ok _ [] = True
        ok seen (n:ns) = not (n `member` seen) && ok (n `insert` seen) ns
        member n word = testBit word n
        insert n word = setBit word n
        empty = 0 :: Int

The function looks for duplicate elements in a list of small integers. 该函数在小整数列表中查找重复的元素。 The set seen is a representation of a set of small integers as a bit vector. seen的集合是一组小整数的表示形式，作为位向量。 The profiler (run with ghc -prof -auto-all ) claims that the ok function accounts for 22% of allocation overall. 分析器（使用ghc -prof -auto-all运行）声称ok函数占总分配的22％。 Looking at the output with -ddump-simpl , I can't understand why this code is allocating. 用-ddump-simpl输出，我不明白为什么要分配此代码。 I checked, and as far as I can tell it is not allocating a thunk for the call to insert . 我检查了一下，据我所知，它没有为insert的调用分配thunk。

What should I look at to identify the part of my code that is allocating? 我应该怎么看才能确定我正在分配的代码部分？

Answer 1

Generally 通常

I know of simple (scientific) implementations of functional languages, and if I remember correctly there is the G-Machine that may be used with Haskell. 我知道功能语言的简单（科学）实现，如果我没记错的话，Haskell可以使用G机器。

This means (again, if I remember correctly) that your program state is represented like a "Tree", where the nodes are (for the sake of simplicity here) the functions you use in your code. 这意味着（再次，如果我没记错的话）表示您的程序状态像“树”一样表示，其中的节点（为简化起见）是您在代码中使用的功能。 The leafes would be the arguments to it. 叶子将是它的参数。 The "G-Maschine" then looks along the "Spine" (the left-side chain of nodes) and looks in the set of available "Functions" ("Supercombinators"?) for a pattern-match that it can apply. 然后，“ G-Maschine”沿着“ Spine”（节点的左侧链）查找，并在一组可用的“ Functions”（“ Supercombinators”？）中查找可应用的模式匹配。 If a mattern-match is recognized from the left side of a definition it is then replaced by the right side of the definition. 如果从定义的左侧识别出物质匹配 ，则将其替换为定义的右侧。

This means that even a simple line like 这意味着即使是一条简单的线

ok seen (n:ns) = not (n `member` seen) && ok (n `insert` seen) ns

or even 甚至

(n:ns) = ns

is doing something in computer memory, ie matching the pattern 正在计算机内存中执行某项操作，即匹配模式

       ...
     ...
    (:)
   /   \
  n     ns

and replacing it with 并替换为

       ...
     ...
    ns

The final result might consume less memory then the input, but this is a dynamic step and therefore must take place somewhere. 最终结果可能比输入消耗更少的内存，但这是一个动态步骤，因此必须在某个地方进行。 If this is repeated over and over again (in a "tight loop") then this will make you CPU busy, as well it will your memory -- just because the G-Machine is operating. 如果一遍又一遍地重复（以“紧密循环”的方式），那么这将使您的CPU繁忙，也将使您的内存繁忙-仅因为G-Machine正在运行。 (As I said, I am not sure the G-Machine-concept applies here, but I guess it is something similar). （正如我所说，我不确定G机器概念是否适用于此，但我想这是相似的）。

Specific guesses 具体猜测

    member n word = testBit word n
    insert n word = setBit word n

Besides that I habe some suspicions. 除此之外，我还有些怀疑。 testBit and setBit look like index operations on lists. testBit和setBit看起来像列表上的索引操作。 If they are it could take some work. 如果是这样，可能需要一些工作。 If they are proper arrays it would be ok. 如果它们是正确的数组，那就可以了。 If they are a sort of maps or sets... well... there might be costly hashing involved? 如果它们是某种地图或集合……那么……可能涉及代价高昂的散列吗？ Or implemented via a balanced tree, which uses lots of (costly?) comparision operations? 还是通过平衡树来实现？平衡树需要使用大量（比较昂贵的）比较操作？

如何在用GHC编译的Haskell函数中找到分配？

问题描述

1 个解决方案

解决方案1
1 2012-12-18 09:45:49

Generally 通常

Specific guesses 具体猜测

如何在用GHC编译的Haskell函数中找到分配？

问题描述

1 个解决方案

解决方案1 1 2012-12-18 09:45:49

Generally 通常

Specific guesses 具体猜测

解决方案1
1 2012-12-18 09:45:49