简体   繁体   English

从没有elem的Haskell列表中删除重复项

[英]Removing duplicates from a list in Haskell without elem

I'm trying to define a function which will remove duplicates from a list.我正在尝试定义一个从列表中删除重复项的函数。 So far I have a working implementation:到目前为止,我有一个有效的实现:

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs)   | x `elem` xs   = rmdups xs
                | otherwise     = x : rmdups xs

However I'd like to rework this without using elem .但是,我想在不使用elem的情况下对其进行返工。 What would be the best method for this?最好的方法是什么?

I'd like to do this using my own function and not nub or nubBy .我想使用我自己的函数而不是nubnubBy来做到这一点。

Both your code and nub have O(N^2) complexity.您的代码和nub都具有O(N^2)复杂性。

You can improve the complexity to O(N log N) and avoid using elem by sorting, grouping, and taking only the first element of each group.您可以将复杂度提高到O(N log N) ,并通过排序、分组和仅取每个组的第一个元素来避免使用elem

Conceptually,从概念上讲,

rmdups :: (Ord a) => [a] -> [a]
rmdups = map head . group . sort

Suppose you start with the list [1, 2, 1, 3, 2, 4] .假设您从列表[1, 2, 1, 3, 2, 4]开始。 By sorting it, you get, [1, 1, 2, 2, 3, 4] ;通过排序,你得到[1, 1, 2, 2, 3, 4] ; by grouping that, you get, [[1, 1], [2, 2], [3], [4]] ;通过分组,你得到[[1, 1], [2, 2], [3], [4]] ; finally, by taking the head of each list, you get [1, 2, 3, 4] .最后,通过获取每个列表的头部,你得到[1, 2, 3, 4]

The full implementation of the above just involves expanding each function.上述的完整实现只涉及扩展每个功能。

Note that this requires the stronger Ord constraint on the elements of the list, and also changes their order in the returned list.请注意,这需要对列表元素有更强的Ord约束,并且还会更改它们在返回列表中的顺序。

Even easier.更容易。

import Data.Set 
mkUniq :: Ord a => [a] -> [a]
mkUniq = toList . fromList

Convert the set to a list of elements in O(n) time:O(n)时间内将集合转换为元素列表:

 toList :: Set a -> [a]

Create a set from a list of elements in O(n log n) time:O(n log n)时间内从元素列表创建一个集合:

 fromList :: Ord a => [a] -> Set a

In python it would be no different.在 python 中也不例外。

def mkUniq(x): 
   return list(set(x)))

Same as @scvalex's solution the following has an O(n * log n) complexity and an Ord dependency.与@scvalex 的解决方案相同,以下具有O(n * log n)复杂性和Ord依赖性。 In difference to it, it preserves the order, keeping the first occurences of items.与它不同的是,它保留了顺序,保留了项目的第一次出现。

import qualified Data.Set as Set

rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
  rmdups' _ [] = []
  rmdups' a (b : c) = if Set.member b a
    then rmdups' a c
    else b : rmdups' (Set.insert b a) c

Benchmark results基准测试结果

基准测试结果

As you can see, the benchmark results prove this solution to be the most effective.如您所见,基准测试结果证明此解决方案是最有效的。 You can find the source of this benchmark here .您可以在此处找到此基准测试的来源。

I don't think you'll be able to do it without elem (or your own re-implementation of it).我认为如果没有elem (或您自己重新实现它),您将无法做到这一点。

However, there is a semantic issue with your implementation.但是,您的实现存在语义问题。 When elements are duplicated you're keeping the last one.当元素重复时,您将保留最后一个。 Personally, I'd expect it to keep the first duplicate item and drop the rest.就个人而言,我希望它保留第一个重复项并删除其余项。

*Main> rmdups "abacd"
"bacd"

The solution is to thread the 'seen' elements through as a state variable.解决方案是将“可见”元素作为状态变量贯穿。

removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
    where rdHelper seen [] = seen
          rdHelper seen (x:xs)
              | x `elem` seen = rdHelper seen xs
              | otherwise = rdHelper (seen ++ [x]) xs

This is more-or-less how nub is implemented in the standard library (read the source here ).这或多或少是如何在标准库中实现nub的(在此处阅读源代码)。 The small difference in nub 's implementation ensures that it is non-strict , while removeDuplicates above is strict (it consumes the entire list before returning). nub实现的微小差异确保它是非严格的,而上面的removeDuplicates是严格的(它在返回之前消耗整个列表)。

Primitive recursion is actually overkill here, if you're not worried about strictness.如果您不担心严格性,这里的原始递归实际上是多余的。 removeDuplicates can be implemented in one line with foldl : removeDuplicates可以用foldl在一行中实现:

removeDuplicates2 = foldl (\seen x -> if x `elem` seen
                                      then seen
                                      else seen ++ [x]) []

It is too late to answer this question but I want to share my solution which is original without using elem and don't assume Ord .现在回答这个问题为时已晚,但我想分享我的原创解决方案,不使用elem并且不要假设Ord

rmdups' :: (Eq a) => [a] -> [a]
rmdups' [] = []
rmdups' [x] = [x]
rmdups' (x:xs) = x : [ k  | k <- rmdups'(xs), k /=x ]

This solution removes duplicates in the end of input, while question implementation deletes in the beginning.该解决方案在输入结束时删除重复项,而问题实现在开始时删除。 For example,例如,

rmdups "maximum-minimum"
-- "ax-nium"

rmdups' "maximum-minimum"
-- ""maxiu-n"

Also, this code complexity is O(N*K) where N is the length of string and K is the number of unique characters in the string.此外,此代码复杂度为 O(N*K),其中 N 是字符串的长度,K 是字符串中唯一字符的数量。 N >= K thus, it will be O(N^2) in worst-case but this means that there is no repetition in the string and this is unlike since you try to delete duplicates in the string. N >= K 因此,在最坏的情况下它将是 O(N^2) 但这意味着字符串中没有重复,这与您尝试删除字符串中的重复项不同。

Using recursion-schemes :使用递归方案

import Data.Functor.Foldable

dedup :: (Eq a) => [a] -> [a]
dedup = para pseudoalgebra
    where pseudoalgebra Nil                 = []
          pseudoalgebra (Cons x (past, xs)) = if x `elem` past then xs else x:xs

While this is certainly more advanced, I think it is quite elegant and shows off some worthwhile functional programming paradigms.虽然这肯定更先进,但我认为它非常优雅,并展示了一些有价值的函数式编程范例。

Graham Hutton has a rmdups function on p. Graham Hutton在 p 上有一个rmdups函数。 86 of Programming in Haskell . 86 的Haskell 编程 It preserves order.它保持秩序。 It is as follows.如下。

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : filter (/= x) (rmdups xs)
rmdups "maximum-minimum"

"maxiu-n" “maxiu-n”

This was bothering me until I saw Hutton's function.这一直困扰着我,直到我看到赫顿的功能。 Then, I tried, again.然后,我又试了一次。 There are two versions, The first keeps the last duplicate, the second keeps the first.有两个版本,第一个保留最后一个副本,第二个保留第一个。

rmdups ls = [d|(z,d)<- zip [0..] ls, notElem d $ take z ls]
rmdups "maximum-minimum"

"maxiu-n" “maxiu-n”

If you want to take the first and not the last duplicate elements of the list, as you are trying to do, just change take to drop in the function and change the enumeration zip [0..] to zip [1..] .如果您想获取列表的第一个而不是最后一个重复元素,就像您尝试做的那样,只需更改takedrop函数并将枚举zip [0..]更改为zip [1..]

I would like to add to @fp_mora answer that on page 136 of Programming in Haskell there is another slightly different implementation:我想在@fp_mora 的回答中补充说,在 Haskell 编程的第 136 页上,还有另一个略有不同的实现:

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x : xs) = x : rmdups (filter (/= x) xs)

It was easier for me to wrap my head around this one.我更容易把头绕在这个上面。

You can use this compress function too.您也可以使用此压缩功能。

cmprs ::Eq a=>[a] -> [a]
--cmprs [] = [] --not necessary
cmprs (a:as) 
    |length as == 1 = as
    |a == (head as) = cmprs as
    |otherwise = [a]++cmprs as

Using dropWhile also works, but remember to sort the list before using this使用 dropWhile 也可以,但请记住在使用它之前对列表进行排序

rmdups :: (Eq a) => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : (rmdups $ dropWhile (\y -> y == x) xs)
remdups xs = foldr (\y ys -> y:filter (/= y) ys) [] xs

this apply the function to the first element and the list cnstructed recursively in the same way.这将函数应用于第一个元素和以相同方式递归构造的列表。 at the first iteration basically you create a list where you only know the first element, and the rest of the list is constructed in the same way (adding the element to the list), and then is filtered to remove the item that specific cycle is adding.在第一次迭代中,基本上你创建一个列表,你只知道第一个元素,列表的其余部分以相同的方式构造(将元素添加到列表中),然后过滤以删除特定循环的项目添加。

So every iteration adds an element (call it X ) to the list and filter the list removing all elements = X所以每次迭代都会向列表中添加一个元素(称为X )并过滤列表,删除所有元素 = X

...or by using the function union from Data.List applied to itself: ...或通过使用来自 Data.List 的函数 union 应用于自身:

import Data.List

unique x = union x x
remove_duplicates (x:xs)
  | xs == []       = [x]
  | x == head (xs) = remove_duplicates xs
  | otherwise      = x : remove_duplicates xs

You could try doing this.你可以尝试这样做。 I've merely replaced 'elem' with my own implementation.我只是用我自己的实现替换了“elem”。 It works for me.这个对我有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM