F# 与 OCaml：堆栈溢出

Question

I recently found a presentation about F# for Python programmers , and after watching it, I decided to implement a solution to the "ant puzzle" on my own.最近发现了一篇关于F# for Python程序员的介绍，看了之后决定自己实现一个“蚂蚁拼图”的解决方案。

There is an ant that can walk around on a planar grid.有一只蚂蚁可以在平面网格上走动。 The ant can move one space at a time left, right, up or down.蚂蚁可以一次向左、向右、向上或向下移动一个格。 That is, from the cell (x, y) the ant can go to cells (x+1, y), (x-1, y), (x, y+1), and (x, y-1).也就是说，蚂蚁可以从单元格 (x, y) 到达单元格 (x+1, y)、(x-1, y)、(x, y+1) 和 (x, y-1)。 Points where the sum of the digits of the x and y coordinates are greater than 25 are inaccessible to the ant.蚂蚁无法访问 x 和 y 坐标的数字之和大于 25 的点。 For example, the point (59,79) is inaccessible because 5 + 9 + 7 + 9 = 30, which is greater than 25. The question is: How many points can the ant access if it starts at (1000, 1000), including (1000, 1000) itself?比如点(59,79)是不可访问的，因为5+9+7+9=30，大于25。包括 (1000, 1000) 本身？

I implemented my solution in 30 lines of OCaml first , and tried it out:我首先在 30 行OCaml 中实现了我的解决方案，并尝试了它：

$ ocamlopt -unsafe -rectypes -inline 1000 -o puzzle ant.ml
$ time ./puzzle
Points: 148848

real    0m0.143s
user    0m0.127s
sys     0m0.013s

Neat, my result is the same as that of leonardo's implementation, in D and C++ .整洁，我的结果与leonardo 的实现相同，在 D 和 C++ 中。 Comparing to Leonardo's C++ implementation, the OCaml version runs approx 2 times slower than C++.与 Leonardo 的 C++ 实现相比，OCaml 版本的运行速度比 C++ 慢约 2 倍。 Which is OK, given that Leonardo used a queue to remove recursion.考虑到 Leonardo 使用队列来删除递归，这是可以的。

I then translated the code to F# ... and here's what I got:然后我将代码翻译成 F# ......这是我得到的：

Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/FSharp-2.0.0.0/bin/fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 2.0.0.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe

Process is terminated due to StackOverflowException.
Quit

Thanassis@HOME /g/Tmp/ant.fsharp
$ /g/Program\ Files/Microsoft\ F#/v4.0/Fsc.exe ant.fs
Microsoft (R) F# 2.0 Compiler build 4.0.30319.1
Copyright (c) Microsoft Corporation. All Rights Reserved.

Thanassis@HOME /g/Tmp/ant.fsharp
$ ./ant.exe

Process is terminated due to StackOverflowException

Stack overflow... with both versions of F# I have in my machine... Out of curiosity, I then took the generated binary (ant.exe) and run it under Arch Linux/Mono:堆栈溢出……我的机器上有两个版本的 F#……出于好奇，我然后使用生成的二进制文件 (ant.exe) 并在 Arch Linux/Mono 下运行它：

$ mono -V | head -1
Mono JIT compiler version 2.10.5 (tarball Fri Sep  9 06:34:36 UTC 2011)

$ time mono ./ant.exe
Points: 148848

real    1m24.298s
user    0m0.567s
sys     0m0.027s

Surprisingly, it runs under Mono 2.10.5 (ie no stack overflow) - but it takes 84 seconds, ie 587 times slower than OCaml - oops.令人惊讶的是，它在 Mono 2.10.5 下运行（即没有堆栈溢出） - 但它需要 84 秒，即比 OCaml 慢 587 倍 - 哎呀。

So this program...所以这个程序...

runs fine under OCaml在 OCaml 下运行良好
doesn't work at all under .NET/F#在 .NET/F# 下根本不起作用
works, but is very slow, under Mono/F#.工作，但在 Mono/F# 下非常慢。

Why?为什么？

EDIT: Weirdness continues - Using "--optimize+ --checked-" makes the problem disappear, but only under ArchLinux/Mono ;编辑：奇怪的事情还在继续——使用“--optimize+ --checked-”使问题消失，但仅限于 ArchLinux/Mono ； under Windows XP and Windows 7/64bit, even the optimized version of the binary stack overflows.在 Windows XP 和 Windows 7/64bit 下，即使是二进制堆栈的优化版本也会溢出。

Final EDIT : I found out the answer myself - see below.最终编辑：我自己找到了答案-见下文。

Answer 1

Executive summary:执行摘要：

I wrote a simple implementation of an algorithm... that wasn't tail-recursive.我写了一个算法的简单实现......它不是尾递归的。
I compiled it with OCaml under Linux.我是在Linux下用OCaml编译的。
It worked fine, and finished in 0.14 seconds.它运行良好，并在 0.14 秒内完成。

It was then time to port to F#.是时候移植到 F# 了。

I translated the code (direct translation) to F#.我将代码（直接翻译）翻译成 F#。
I compiled under Windows, and run it - I got a stack overflow.我在 Windows 下编译并运行它 - 堆栈溢出。
I took the binary under Linux, and run it under Mono.我在 Linux 下获取了二进制文件，并在 Mono 下运行它。
It worked, but run very slowly (84 seconds).它有效，但运行速度非常慢（84 秒）。

I then posted to Stack Overflow - but some people decided to close the question (sigh).然后我发布到 Stack Overflow - 但有些人决定结束这个问题（叹气）。

I tried compiling with --optimize+ --checked-我尝试使用 --optimize+ --checked- 进行编译
The binary still stack overflowed under Windows... Windows下的二进制仍然堆栈溢出......
...but run fine (and finished in 0.5 seconds) under Linux/Mono. ...但在 Linux/Mono 下运行良好（并在 0.5 秒内完成）。

It was time to check the stack size: Under Windows, another SO post pointed out that it is set by default to 1MB .是时候检查堆栈大小了：在 Windows 下，另一篇 SO 帖子指出它默认设置为 1MB 。 Under Linux, "uname -s" and a compilation of a test program clearly showed that it is 8MB.在Linux下，“uname -s”和一个测试程序的编译清楚地表明它是8MB。

This explained why the program worked under Linux and not under Windows (the program used more than 1MB of stack).这解释了为什么该程序在 Linux 下运行而不是在 Windows 下运行（该程序使用了超过 1MB 的堆栈）。 It didn't explain why the optimized version run so much better under Mono than the non-optimized one: 0.5 seconds vs 84 seconds (even though the --optimize+ appears to be set by default, see comment by Keith with "Expert F#" extract).它没有解释为什么优化版本在 Mono 下比非优化版本运行得更好：0.5 秒与 84 秒（即使 --optimize+ 似乎是默认设置的，请参阅 Keith 与“专家 F#”的评论提炼）。 Probably has to do with the garbage collector of Mono, which was somehow driven to extremes by the 1st version.可能与 Mono 的垃圾收集器有关，它被第一个版本以某种方式推向了极端。

The difference between Linux/OCaml and Linux/Mono/F# execution times (0.14 vs 0.5) is because of the simple way I measured it: "time ./binary ..." measures the startup time as well, which is significant for Mono/.NET (well, significant for this simple little problem). Linux/OCaml 和 Linux/Mono/F# 执行时间之间的差异（0.14 vs 0.5）是因为我测量它的方法很简单：“time ./binary ...”也测量启动时间，这对 Mono 很重要/.NET（嗯，对于这个简单的小问题很重要）。

Anyway, to solve this once and for all, I wrote a tail-recursive version - where the recursive call at the end of the function is transformed into a loop (and hence, no stack usage is necessary - at least in theory).无论如何，为了一劳永逸地解决这个问题，我写了一个尾递归版本- 函数末尾的递归调用被转换为一个循环（因此，不需要使用堆栈 - 至少在理论上）。

The new version run fine under Windows as well, and finished in 0.5 seconds.新版本在 Windows 下也运行良好，并在 0.5 秒内完成。

So, moral of the story:所以，故事的寓意：

Beware of your stack usage, especially if you use lots of it and run under Windows.请注意您的堆栈使用情况，尤其是当您使用大量堆栈并在 Windows 下运行时。 Use EDITBIN with the /STACK option to set your binaries to larger stack sizes, or better yet, write your code in a manner that doesn't depend on using too much stack.使用带有 /STACK 选项的 EDITBIN将二进制文件设置为更大的堆栈大小，或者更好的是，以不依赖于使用过多堆栈的方式编写代码。
OCaml may be better at tail-recursion elimination than F# - or it's garbage collector is doing a better job at this particular problem. OCaml 可能比 F# 更擅长尾递归消除——或者它的垃圾收集器在这个特定问题上做得更好。
Don't despair about ...rude people closing your Stack Overflow questions, good people will counteract them in the end - if the questions are really good :-)不要对......粗鲁的人关闭您的 Stack Overflow 问题感到绝望，好人最终会抵消它们 - 如果问题真的很好:-)

PS Some additional input from Dr. Jon Harrop: PS Jon Harrop 博士的一些额外意见：

...you were just lucky that OCaml didn't overflow as well. ...你很幸运 OCaml 也没有溢出。 You already identified that actual stack sizes vary between platforms.您已经确定实际堆栈大小因平台而异。 Another facet of the same issue is that different language implementations eat stack space at different rates and have different performance characteristics in the presence of deep stacks.同一问题的另一个方面是不同的语言实现以不同的速率占用堆栈空间，并且在存在深堆栈时具有不同的性能特征。 OCaml, Mono and .NET all use different data representations and GC algorithms that impact these results... (a) OCaml uses tagged integers to distinguish pointers, giving compact stack frames, and will traverse everything on the stack looking for pointers. OCaml、Mono 和 .NET 都使用不同的数据表示和 GC 算法来影响这些结果...... (a) OCaml 使用标记整数来区分指针，提供紧凑的堆栈帧，并将遍历堆栈上的所有内容寻找指针。 The tagging essentially conveys just enough information for the OCaml run time to be able to traverse the heap (b) Mono treats words on the stack conservatively as pointers: if, as a pointer, a word would point into a heap-allocated block then that block is considered to be reachable.标记本质上传达了 OCaml 运行时能够遍历堆的足够信息 (b) Mono 保守地将堆栈上的单词视为指针：如果作为指针，单词将指向堆分配的块，那么块被认为是可达的。 (c) I do not know .NET's algorithm but I wouldn't be surprised if it ate stack space faster and still traversed every word on the stack (it certainly suffers pathological performance from the GC if an unrelated thread has a deep stack!)... Moreover, your use of heap-allocated tuples means you'll be filling the nursery generation (eg gen0) quickly and, therefore, causing the GC to traverse those deep stacks often... (c) 我不知道 .NET 的算法，但如果它更快地占用堆栈空间并且仍然遍历堆栈中的每个单词，我不会感到惊讶（如果不相关的线程有一个很深的堆栈，它肯定会受到 GC 的病态性能的影响！） ...此外，您使用堆分配的元组意味着您将快速填充苗圃代（例如 gen0），因此，导致 GC 经常遍历那些深堆栈...

Answer 2

Let me try to summarize the answer.让我试着总结一下答案。

There are 3 points to be made:有3点需要注意：

problem: stack overflow happens on a recursive function问题：堆栈溢出发生在递归函数上
it happens only under windows: on linux, for the problem size examined, it works它仅在 Windows 下发生：在 linux 上，对于检查的问题大小，它有效
same (or similar) code in OCaml works OCaml 中相同（或相似）的代码有效
optimize+ compiler flag, for the problem size examined, works对于检查的问题大小，优化+ 编译器标志有效

It is very common that a Stack Overflow exception is the result of a recursive vall. Stack Overflow 异常是递归 val 的结果是很常见的。 If the call is in tail position, the compiler may recognize it and apply tail call optimization, therefore the recursive call(s) will not take up stack space.如果调用在尾部位置，编译器可能会识别它并应用尾部调用优化，因此递归调用不会占用堆栈空间。 Tail call optimization may happen in F#, in the CRL, or in both:尾调用优化可能发生在 F#、CRL 或两者中：

CLR tail optimization 1 CLR尾部优化1

F# recursion (more general) 2 F# 递归（更通用） 2

F# tail calls 3 F# 尾调用3

The correct explanation for "fails on windows, not in linux" is, as other said, the default reserved stack space on the two OS.正如其他人所说，“在 Windows 上失败，而不是在 linux 中失败”的正确解释是两个操作系统上的默认保留堆栈空间。 Or better, the reserved stack space used by the compilers under the two OSes.或者更好的是两个操作系统下编译器使用的保留堆栈空间。 By default, VC++ reserves only 1MB of stack space.默认情况下，VC++ 只保留 1MB 的堆栈空间。 The CLR is (likely) compiled with VC++, so it has this limitation. CLR（很可能）是用 VC++ 编译的，所以它有这个限制。 Reserved stack space can be increased at compile time, but I'm not sure if it can be modified on compiled executables.保留的堆栈空间可以在编译时增加，但我不确定它是否可以在编译的可执行文件上修改。

EDIT: turns out that it can be done (see this blog post http://www.bluebytesoftware.com/blog/2006/07/04/ModifyingStackReserveAndCommitSizesOnExistingBinaries.aspx ) I would not recommend it, but in extreme situations at least it is possible.编辑：事实证明它可以完成（请参阅此博客文章http://www.bluebytesoftware.com/blog/2006/07/04/ModifyingStackReserveAndCommitSizesOnExistingBinaries.aspx ）我不推荐它，但至少在极端情况下它是可能的。

OCaml version may work because it was run under Linux. OCaml 版本可能有效，因为它是在 Linux 下运行的。 However, it would be interesting to test also the OCaml version under Windows.但是，在 Windows 下测试 OCaml 版本也会很有趣。 I know that the OCaml compiler is more aggressive at tail-call optimization than F#.. could it even extract a tail recursive function from your original code?我知道 OCaml 编译器在尾调用优化方面比 F# 更积极......它甚至可以从原始代码中提取尾递归函数吗？

My guess about "--optimize+" is that it will still cause the code to recur, hence it will still fail under Windows, but will mitigate the problem by making the executable run faster.我对“--optimize+”的猜测是它仍然会导致代码重复出现，因此它在 Windows 下仍然会失败，但会通过使可执行文件运行得更快来缓解问题。

Finally, the definitive solution is to use tail recursion (by rewriting the code or by relying on aggressive compiler optimization);最后，最终的解决方案是使用尾递归（通过重写代码或依靠积极的编译器优化）； it is a good way to avoid stack overflow problem with recursive functions.这是避免递归函数堆栈溢出问题的好方法。

F# 与 OCaml：堆栈溢出

问题描述

2 个解决方案

解决方案1
74 已采纳 2011-09-30 13:31:11

解决方案2
8 2011-09-30 10:05:59

F# 与 OCaml：堆栈溢出

问题描述

2 个解决方案

解决方案1 74 已采纳 2011-09-30 13:31:11

解决方案2 8 2011-09-30 10:05:59

解决方案1
74 已采纳 2011-09-30 13:31:11

解决方案2
8 2011-09-30 10:05:59