为什么这个 F# 序列 function 不是尾递归的？

Question

Disclosure: this came up in FsCheck, an F# random testing framework I maintain.披露：这出现在我维护的 F# 随机测试框架 FsCheck 中。 I have a solution, but I do not like it.我有一个解决方案，但我不喜欢它。 Moreover, I do not understand the problem - it was merely circumvented.此外，我不明白这个问题 - 它只是被规避了。

A fairly standard implementation of (monadic, if we're going to use big words) sequence is: (monadic, if we're going to use big words) 序列的一个相当标准的实现是：

let sequence l = 
    let k m m' = gen { let! x = m
                       let! xs = m'
                       return (x::xs) }
    List.foldBack k l (gen { return [] })

Where gen can be replaced by a computation builder of choice.其中 gen 可以由选择的计算构建器替换。 Unfortunately, that implementation consumes stack space, and so eventually stack overflows if the list is long enough.The question is: why?不幸的是，该实现消耗了堆栈空间，因此如果列表足够长，最终堆栈溢出。问题是：为什么？ I know in principle foldBack is not tail recursive, but the clever bunnies of the F# team have circumvented that in the foldBack implementation.我知道原则上 foldBack 不是尾递归，但是 F# 团队的聪明兔子在 foldBack 实现中规避了这一点。 Is there a problem in the computation builder implementation?计算构建器实现是否存在问题？

If I change the implementation to the below, everything is fine:如果我将实现更改为以下，一切都很好：

let sequence l =
    let rec go gs acc size r0 = 
        match gs with
        | [] -> List.rev acc
        | (Gen g)::gs' ->
            let r1,r2 = split r0
            let y = g size r1
            go gs' (y::acc) size r2
    Gen(fun n r -> go l [] n r)

For completeness, the Gen type and computation builder can be found in the FsCheck source为了完整起见，可以在 FsCheck 源代码中找到 Gen 类型和计算构建器

Answer 1

Building on Tomas's answer, let's define two modules:基于 Tomas 的回答，让我们定义两个模块：

module Kurt = 
    type Gen<'a> = Gen of (int -> 'a)

    let unit x = Gen (fun _ -> x)

    let bind k (Gen m) =     
        Gen (fun n ->       
            let (Gen m') = k (m n)       
            m' n)

    type GenBuilder() =
        member x.Return(v) = unit v
        member x.Bind(v,f) = bind f v

    let gen = GenBuilder()


module Tomas =
    type Gen<'a> = Gen of (int -> ('a -> unit) -> unit)

    let unit x = Gen (fun _ f -> f x)

    let bind k (Gen m) =     
        Gen (fun n f ->       
            m n (fun r ->         
                let (Gen m') = k r        
                m' n f))

    type GenBuilder() =
        member x.Return v = unit v
        member x.Bind(v,f) = bind f v

    let gen = GenBuilder()

To simplify things a bit, let's rewrite your original sequence function as为了简化一点，让我们将您的原始序列 function 重写为

let rec sequence = function
| [] -> gen { return [] }
| m::ms -> gen {
    let! x = m
    let! xs = sequence ms
    return x::xs }

Now, sequence [for i in 1.. 100000 -> unit i] will run to completion regardless of whether sequence is defined in terms of Kurt.gen or Tomas.gen .现在， sequence [for i in 1.. 100000 -> unit i]将运行完成，无论sequence是根据Kurt.gen还是Tomas.gen定义的。 The issue is not that sequence causes a stack overflow when using your definitions, it's that the function returned from the call to sequence causes a stack overflow when it is called.问题不在于使用定义时sequence导致堆栈溢出，而是从调用sequence返回的 function 在调用时导致堆栈溢出。

To see why this is so, let's expand the definition of sequence in terms of the underlying monadic operations:要了解为什么会这样，让我们根据底层的一元操作来扩展sequence的定义：

let rec sequence = function
| [] -> unit []
| m::ms ->
    bind (fun x -> bind (fun xs -> unit (x::xs)) (sequence ms)) m

Inlining the Kurt.unit and Kurt.bind values and simplifying like crazy, we get内联Kurt.unit和Kurt.bind值并疯狂简化，我们得到

let rec sequence = function
| [] -> Kurt.Gen(fun _ -> [])
| (Kurt.Gen m)::ms ->
    Kurt.Gen(fun n ->
            let (Kurt.Gen ms') = sequence ms
            (m n)::(ms' n))

Now it's hopefully clear why calling let (Kurt.Gen f) = sequence [for i in 1.. 1000000 -> unit i] in f 0 overflows the stack: f requires a non-tail-recursive call to sequence and evaluation of the resulting function, so there will be one stack frame for each recursive call.现在希望清楚为什么let (Kurt.Gen f) = sequence [for i in 1.. 1000000 -> unit i] in f 0溢出堆栈： f需要对序列进行非尾递归调用并评估结果 function，因此每个递归调用都会有一个堆栈帧。

Inlining Tomas.unit and Tomas.bind into the definition of sequence instead, we get the following simplified version: Tomas.unit和Tomas.bind内联到sequence的定义中，我们得到以下简化版本：

let rec sequence = function
| [] -> Tomas.Gen (fun _ f -> f [])
| (Tomas.Gen m)::ms ->
    Tomas.Gen(fun n f ->  
        m n (fun r ->
            let (Tomas.Gen ms') = sequence ms
            ms' n (fun rs ->  f (r::rs))))

Reasoning about this variant is tricky.关于这个变体的推理很棘手。 You can empirically verify that it won't blow the stack for some arbitrarily large inputs (as Tomas shows in his answer), and you can step through the evaluation to convince yourself of this fact.您可以凭经验验证它不会因某些任意大的输入而破坏堆栈（正如 Tomas 在他的回答中所显示的那样），并且您可以逐步进行评估以说服自己相信这一事实。 However, the stack consumption depends on the Gen instances in the list that's passed in, and it is possible to blow the stack for inputs that aren't themselves tail recursive:但是，堆栈消耗取决于传入的列表中的Gen实例，并且可能会破坏堆栈以获取本身不是尾递归的输入：

// ok
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> unit i]
f 0 (fun list -> printfn "%i" list.Length)

// not ok...
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> Gen(fun _ f -> f i; printfn "%i" i)]
f 0 (fun list -> printfn "%i" list.Length)

Answer 2

You're correct - the reason why you're getting a stack overflow is that the bind operation of the monad needs to be tail-recursive (because it is used to aggregate values during folding).你是对的 - 你得到堆栈溢出的原因是 monad 的bind操作需要是尾递归的（因为它用于在折叠期间聚合值）。

The monad used in FsCheck is essentially a state monad (it keeps the current generator and some number). FsCheck 中使用的 monad 本质上是一个 state monad（它保留当前的生成器和一些数字）。 I simplified it a bit and got something like:我简化了一点，得到了类似的东西：

type Gen<'a> = Gen of (int -> 'a)

let unit x = Gen (fun n -> x)

let bind k (Gen m) = 
    Gen (fun n -> 
      let (Gen m') = k (m n) 
      m' n)

Here, the bind function is not tail-recursive because it calls k and then does some more work.在这里， bind function 不是尾递归的，因为它调用k然后做更多的工作。 You can change the monad to be a continuation monad .您可以将 monad 更改为continuation monad 。 It is implemented as a function that takes the state and a continuation - a function that is called with the result as an argument.它被实现为一个 function ，它采用 state 和一个延续- 一个 function 以结果作为参数调用。 For this monad, you can make bind tail recursive:对于这个 monad，您可以使bind尾递归：

type Gen<'a> = Gen of (int -> ('a -> unit) -> unit)

let unit x = Gen (fun n f -> f x)

let bind k (Gen m) = 
    Gen (fun n f -> 
      m n (fun r -> 
        let (Gen m') = k r
        m' n f))

The following example will not stack overflow (and it did with the original implementation):以下示例不会堆栈溢出（它在原始实现中也是如此）：

let sequence l = 
  let k m m' = 
    m |> bind (fun x ->
      m' |> bind (fun xs -> 
        unit (x::xs)))
  List.foldBack k l (unit [])

let (Gen f) = sequence [ for i in 1 .. 100000 -> unit i ]
f 0 (fun list -> printfn "%d" list.Length)

为什么这个 F# 序列 function 不是尾递归的？

问题描述

2 个解决方案

解决方案1
8 已采纳 2011-07-07 18:02:07

解决方案2
4 2011-05-30 21:04:01

为什么这个 F# 序列 function 不是尾递归的？

问题描述

2 个解决方案

解决方案1 8 已采纳 2011-07-07 18:02:07

解决方案2 4 2011-05-30 21:04:01

解决方案1
8 已采纳 2011-07-07 18:02:07

解决方案2
4 2011-05-30 21:04:01