简体   繁体   English

为什么Haskell有底(无限递归)?

[英]Why does Haskell have bottom (infinite recursions)?

There are languages other than Haskell, such as Coq , which banned bottom, or undefined , or infinite recursive definitions like 除了Haskell之外还有其他语言,例如Coq ,它禁止底部或undefined或无限递归定义,如

bot :: forall a. a
bot = bot

The benefit of not having bottom is simple : all programs terminate. 没有底部的好处很简单:所有程序都终止。 The compiler guarantees that there are no infinite loops, no infinite recursions. 编译器保证没有无限循环,没有无限递归。

There is also a less obvious benefit : the logic of the language (given by the Curry-Howard correspondence ) is consistent, it cannot prove a contradiction. 还有一个不那么明显的好处:语言的逻辑(由库里 - 霍华德对应给出 )是一致的,它不能证明是矛盾的。 So the same language can write both programs and proofs that the programs are correct. 因此,相同的语言可以编写程序和程序正确的证明。 But that's maybe off-topic here. 但这可能是偏离主题的。

The protection against infinite recursions is also simple : force each recursive definition to have arguments (here bot has none) and force recursive calls to be decreasing on one of those arguments. 防止无限递归的保护也很简单:强制每个递归定义都有参数(这里bot没有),并强制递归调用减少其中一个参数。 Here decreasing is in the sense of algebraic data types, seen as finite trees of contructors and values. 这里减少的是代数数据类型,被视为构造函数和值的有限树。 Coq's compiler checks that the decreasing argument is an ADT ( data in Haskell) and that the recursive calls are done on subtrees of the argument, typically via a case of , not on other trees coming from somewhere else. Coq的编译器检查减少的参数是否为ADT(Haskell中的data ),并且递归调用是在参数的子树上完成的,通常是通过一个case of ,而不是来自其他地方的其他树。

Now the cost of this language constraint : we lose Turing-completeness (because we cannot solve the halting problem ). 现在这种语言约束的成本:我们失去了图灵完整性(因为我们无法解决暂停问题 )。 That means there are terminating functions, possible to code in Haskell using general recursions, that would become refused by the compiler. 这意味着有终止函数,可以使用一般递归在Haskell中编码,这将被编译器拒绝。 In practice however, the magnitude of Coq's library shows that those exotic functions are rarely needed. 然而,在实践中,Coq库的大小表明很少需要那些奇特的功能。 Does someone even know one of them ? 有人甚至知道其中一个吗?

There are cases where infinite loops make sense : 有些情况下无限循环是有意义的:

  • Interactive programs, where the user issues commands by clicking or typing on the keyboard, usually run forever. 用户通过在键盘上单击或键入来发出命令的交互式程序通常会永远运行。 They wait for a command, process it and then wait for the next command. 他们等待命令,处理它然后等待下一个命令。 Until the end of time, or more seriously until the user issues the quit command. 直到时间结束,或者更严重,直到用户发出quit命令。
  • Likewise, instead of processing an infinite stream of user commands, process an infinite stream of data. 同样,不是处理无限的用户命令流,而是处理无限的数据流。 Such as continuous queries on a database. 比如对数据库的连续查询。

Those cases are rather specific and might be treated by new language primitives. 这些案例非常具体,可能会被新的语言原语处理。 Haskell introduced IO to trace unsafe interactions. Haskell引入了IO来跟踪不安全的交互。 Why not declare the possibility of infinite loops in the signature of functions ? 为什么不在函数签名中声明无限循环的可能性? Or split a complex program into a DSMS that calls Haskell functions for pure computations ? 或者将复杂程序拆分为DSMS ,调用Haskell函数进行纯计算?

EDIT 编辑

Here is an algorithm example, that might clarify what changes if we switch to total programming. 这是一个算法示例,如果我们切换到总编程,可能会澄清哪些更改。 Euclid's algorithm for computing the GCD of 2 numbers, first in plain recursive Haskell Euclid用于计算2个数字的GCD的算法,首先在简单的递归Haskell中

euclid_gcd :: Int -> Int -> Int
euclid_gcd m n = if n <= 0 then m else euclid_gcd n (m `mod` n)

Two things can be proven concerning this function : that it terminates, and that it does compute the GCD of m and n. 关于这个函数可以证明两件事:它终止了,它确实计算了m和n的GCD。 In a language accepting proof scripts, we would give the compiler a proof that (m mod n) < n , so that it concludes the recursion is decreasing on its second argument, and therefore terminates. 在接受证明脚本的语言中,我们将给编译器一个证明(m mod n) < n ,以便它总结递归在其第二个参数上减少,因此终止。

In Haskell I doubt we can do that, so we can try to rewrite this algorithm in a structural recursive form, that the compiler can easily check. 在Haskell中,我怀疑我们能做到这一点,所以我们可以尝试以结构递归的形式重写这个算法,编译器可以很容易地检查。 That means a recursive call must be done on the predecessor of some argument. 这意味着必须在某个参数的前任上进行递归调用。 Here m mod n won't do, so it looks like we are stuck. 这里m mod n不会这样做,所以看起来我们被卡住了。 But as with tail recursion, we can add new arguments. 但是与尾递归一样,我们可以添加新的参数。 If we find a bound on the number of recursive calls, we are done. 如果我们找到递归调用数量的界限,我们就完成了。 The bound does not have to be precise, it just needs to be above the actual number of recursive calls. 绑定不必是精确的,它只需要高于实际的递归调用数。 Such a bound argument is usually called variant in the literature, I personally call it fuel . 这样一个约束论证通常在文献中被称为variant ,我个人称之为fuel We force the recursion to stop with an error value when it runs out of fuel. 当耗尽燃料时,我们强制递归以错误值停止。 Here we can take the successor of any of the 2 numbers : 在这里我们可以取两个数字中的任何一个的继承者:

euclid_gcd_term :: Int -> Int -> Int
euclid_gcd_term m n = euclid_gcd_rec m n (n+1)
  where
    euclid_gcd_rec :: Int -> Int -> Int -> Int
    euclid_gcd_rec m n fuel =
      if fuel <= 0 then 0
      else if n <= 0 then m else euclid_gcd_rec n (m `mod` n) (fuel-1)

Here the termination proof somewhat leaks into the implementation, making it slightly harder too read. 终止证明在某种程度上泄漏到实现中,使其稍微难以阅读。 And the implementation makes useless computations on the fuel argument, which could slow down a bit, though in this case I hope Haskell's compiler will make it negligible. 并且实现对燃料参数进行了无用的计算,这可能会减慢一点,尽管在这种情况下我希望Haskell的编译器可以忽略不计。 Coq has an extraction mechanism, that erases the proof part of such mixes of proofs and program, to produce OCaml or Haskell code. Coq有一个提取机制,可以删除这些证明和程序混合的证明部分,以生成OCaml或Haskell代码。

As for euclid_gcd we would then need to prove that euclid_gcd_term does compute the GCD of n and m. 至于euclid_gcd我们需要证明euclid_gcd_term确实计算了n和m的GCD。 That includes proving Euclid's algorithm terminates in less than n+1 steps. 这包括证明Euclid的算法终止于少于n + 1步骤。

euclid_gcd_term is obviously more work than euclid_gcd and arguably less fun. euclid_gcd_term显然比euclid_gcd更有效,可以euclid_gcd不那么有趣。 On the other hand, once the habit is taken, I find it rewarding intellectually to know bounds for my algorithms. 另一方面,一旦养成这种习惯,我发现在智力上有所收获以了解我的算法的界限。 And when I cannot find such bounds, it usually means I don't understand my algorithms. 当我找不到这样的界限时,通常意味着我不理解我的算法。 Which also usually means they are bugged. 这也通常意味着他们被窃听。 We cannot force all developers to use total programming for all programs, but wouldn't a compiling option in Haskell to do it on demand be nice? 我们不能强迫所有开发人员对所有程序使用总编程,但是Haskell中的编译选项不能按需执行吗?

I can't give you a comprehensive answer, but I've spent some time working in Agda over the past year, and here are some drawbacks of totality that I've seen. 我不能给你一个全面的答案,但过去一年我在Agda工作了一段时间,这里有一些我见过的总体缺点。

Basically, when writing a program in Haskell, there are some bits of information that I have, but that I do not explicitly share with the compiler. 基本上,在Haskell中编写程序时,我有一些信息,但我没有明确地与编译器共享。 If this information is necessary for the program to terminate without errors, then Agda forces me to make this information explicit. 如果程序终止没有错误需要此信息,那么Agda强制我明确提供此信息。

Consider Haskell's Data.Map.! 考虑Haskell的Data.Map.! operator that lets you lookup an element in a map by its key. 允许您通过其键在地图中查找元素的运算符。 If you pass a key that is not in the map, it will throw an exception. 如果传递的是不在地图中的键,则会引发异常。 The Agda counterpart of this operator would need to take not only the key, but also a proof that the key is in the map. 此运算符的Agda对应程序不仅需要使用密钥,还需要证明密钥位于映射中。 This has some drawbacks: 这有一些缺点:

  1. Someone would have to come up with a type that lets us express "map m contains key k ", and prove lemmas about how this type interacts with insertion and deletion. 有人必须想出一种类型,让我们表达“map m contains key k ”,并证明这种类型如何与插入和删除相互作用的引理。
  2. Any changes to the definitions of insert and delete will likely invalidate the proofs of these lemmas. insertdelete定义的任何更改都可能使这些引理的证明无效。
  3. When I use this map, I have to keep track of all the membership proofs explicitly, passing them around and keeping them up to date. 当我使用这张地图时,我必须明确地跟踪所有会员证明,传递它们并使它们保持最新。 This is both a syntactic and a mental burden. 这既是语法上的负担,也是心理上的负担。
  4. If I care about performance, I need to make sure that all these proofs are erased at runtime. 如果我关心性能,我需要确保在运行时擦除所有这些证明。

Alternatively, I could use Maybe or Either to explicitly pass these errors around. 或者,我可以使用MaybeEither明确传递这些错误。 Often this is the right thing to do, but it makes it less clear when I anticipate an error happening, and when I've simply not gone through the trouble of showing that an error is impossible. 通常这是正确的做法,但是当我预料到错误发生时,以及当我没有经历过显示错误是不可能的麻烦时,它就会变得不那么清楚。 This approach also doesn't work as well with interactive debuggers: I can easily break on an exception, but not so easily on the construction of a Nothing . 这种方法对交互式调试器也不起作用:我可以很容易地打破异常,但在构建Nothing却不那么容易。

I've been focusing on errors in the above, but the same things hold for non-termination. 我一直专注于上面的错误,但同样的事情适用于非终止。

This isn't to say that total languages are useless—as you say, they have many benefits. 这并不是说总语言没用 - 正如你所说,它们有很多好处。 So far, I just wouldn't say that those benefits obviously outweigh these drawbacks for all applications. 到目前为止,我只是不会说这些好处显然超过了所有应用程序的这些缺点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM