简体   繁体   English

Haskell函数nub效率低下

[英]Haskell function nub inefficient

I'm confused by the implementation of the 'nub' (select unique values) function in the Haskell standard library Data.List . 我对Haskell标准库Data.List中 'nub'(选择唯一值)函数的实现感到困惑。 The GHC implementation is GHC的实施是

nub l                   = nub' l []
  where
    nub' [] _           = []
    nub' (x:xs) ls
        | x `elem` ls   = nub' xs ls
        | otherwise     = x : nub' xs (x:ls)

As far as I can tell, this has a worst-case time complexity of O(n^2), since for a list of unique values it has to compare them all once to see that they are in fact unique. 据我所知,这有一个最坏情况下的时间复杂度为O(n ^ 2),因为对于一个唯一值列表,它必须比较它们一次才能看到它们实际上是唯一的。

If one used a hash table, the complexity could be reduced to O(n) for building the table + O(1) for checking each value against previous values in the hash table. 如果使用哈希表,则复杂性可以减少到O(n)以构建表+ O(1)以检查每个值与哈希表中的先前值。 Granted, this would not produce an ordered list but that would also be possible in O(n log n) using GHC's own ordered Data.Map, if that is necessary. 当然,这不会产生有序列表,但如果有必要,也可以在O(n log n)中使用GHC自己的有序Data.Map。

Why choose such an inefficient implementation for an important library function? 为什么为重要的库函数选择这种低效的实现? I understand efficiency is not a main concern in Haskell but at least the standard library could make an effort to choose the (asymptotically) best data structure for the job. 我知道效率不是Haskell的主要关注点,但至少标准库可以努力为工作选择(渐近)最佳数据结构。

You're absolutely correct - nub is an O(n^2) algorithm. 你是绝对正确的 - nub是一个O(n ^ 2)算法。 However, there are still reasons why you might want to use it instead of using a hashmap: 但是,仍然有理由要使用它而不是使用hashmap:

  • for small lists it still might be faster 对于小型列表,它仍然可能更快
  • nub only requires the Eq constraint; nub只需要Eq约束; by comparison Data.Map requires an Ord constraint on keys and Data.HashMap requires a key type with both Hashable and Ord type classes 通过比较Data.Map需要对键的Ord约束和Data.HashMap需要具有HashableOrd类型类的键类型
  • it's lazy - you don't have to run through the entire input list to start getting results 它是懒惰的 - 你不必遍历整个输入列表就可以开始获得结果

Edit: Slight correction on the third point -- you don't have to process the entire list to start getting results; 编辑:对第三点进行轻微修正 - 您无需处理整个列表即可开始获取结果; you'll still have to examine every element of the input list (so nub won't work on infinite lists), but you'll start returning results as soon as you find a unique element. 你仍然需要检查输入列表的每个元素(因此nub不能在无限列表上工作),但是一旦找到一个唯一的元素,你就会开始返回结果。

Efficiency is quite a concern in Haskell, after all the language performs on par with Java, and beats it in terms of memory consumption, but of course it's not C. 在Haskell中,效率是一个非常值得关注的问题,毕竟语言与Java相当,并且在内存消耗方面胜过它,但当然不是C语言。

The answer to your question is pretty simple: the Prelude's nub requires only an Eq constraint, while any implementation based on Map or Set would also require either an Ord or Hashable . 你的问题的答案很简单:Prelude的nub只需要一个Eq约束,而任何基于MapSet实现也需要一个OrdHashable

https://groups.google.com/forum/m/#!msg/haskell-cafe/4UJBbwVEacg/ieMzlWHUT_IJ https://groups.google.com/forum/m/#!msg/haskell-cafe/4UJBbwVEacg/ieMzlWHUT_IJ

In my experience, "beginner" Haskell (including Prelude and the bad packages) simply ignores performance in many cases, in favor of simplicity. 根据我的经验,“初学者”Haskell(包括Prelude和坏包)在很多情况下都忽略了性能,有利于简单性。

Haskell performance is a complex problem to solve, so if you aren't experienced enough to search through Platform or Hackage for alternatives to the simple nub (and especially if your input is in a List just because you haven't thought about alternative structures), then Data.List.nub is likely not your only major performance problem and also you are probably writing code for a toy project where performance doesn't really matter. Haskell性能是一个需要解决的复杂问题,因此如果您没有足够的经验来搜索平台或Hackage以寻找简单nub点的替代方法(特别是如果您的输入只是因为您没有考虑其他结构而在List中) ,那么Data.List.nub可能不是你唯一的主要性能问题,而且你可能也在为性能无关紧要的玩具项目编写代码。

You just have to have faith that when you get to building a large (in code or data) project, you will be more experienced and know how to set up your programs more efficiently. 您必须相信,当您构建大型(代码或数据)项目时,您将更有经验并且知道如何更有效地设置程序。

In other words, don't worry about it, and assume that anything in Haskell 98 that comes from Prelude or base is likely to not be the most efficient way to solve a problem. 换句话说,不要担心它,并假设来自Prelude或base的Haskell 98中的任何内容可能不是解决问题的最有效方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM