简体   繁体   English

haskell 列表中的唯一元素

[英]unique elements in a haskell list

okay, this is probably going to be in the prelude, but: is there a standard library function for finding the unique elements in a list?好的,这可能会出现在序曲中,但是:是否有标准库 function 用于查找列表中的唯一元素? my (re)implementation, for clarification, is:为了澄清起见,我的(重新)实施是:

has :: (Eq a) => [a] -> a -> Bool
has [] _ = False
has (x:xs) a
  | x == a    = True
  | otherwise = has xs a

unique :: (Eq a) => [a] -> [a]
unique [] = []
unique (x:xs)
  | has xs x  = unique xs
  | otherwise = x : unique xs

I searched for (Eq a) => [a] -> [a] on Hoogle . 我在Hoogle上搜索了(Eq a) => [a] -> [a]

First result was nub (remove duplicate elements from a list). 第一个结果是nub (从列表中删除重复的元素)。

Hoogle is awesome. Hoogle很棒。

The nub function from Data.List (no, it's actually not in the Prelude) definitely does something like what you want, but it is not quite the same as your unique function. Data.Listnub函数(不,实际上它不在Prelude中)绝对可以实现您想要的功能,但是它与您的unique函数并不完全相同。 They both preserve the original order of the elements, but unique retains the last occurrence of each element, while nub retains the first occurrence. 它们都保留元素的原始顺序,但unique保留每个元素的最后一次出现,而nub保留第一次的出现。

You can do this to make nub act exactly like unique , if that's important (though I have a feeling it's not): 如果很重要,您可以执行此操作以使nub行为完全像unique一样(尽管我觉得不是):

unique = reverse . nub . reverse

Also, nub is only good for small lists. 另外, nub仅适用于小型列表。 Its complexity is quadratic, so it starts to get slow if your list can contain hundreds of elements. 它的复杂度是二次的,因此如果您的列表可以包含数百个元素,则它开始变慢。

If you limit your types to types having an Ord instance, you can make it scale better. 如果将类型限制为具有Ord实例的类型,则可以使其扩展性更好。 This variation on nub still preserves the order of the list elements, but its complexity is O(n * log n) : nub上的这种变化仍然保留了列表元素的顺序,但是其复杂度为O(n * log n)

import qualified Data.Set as Set

nubOrd :: Ord a => [a] -> [a] 
nubOrd xs = go Set.empty xs where
  go s (x:xs)
   | x `Set.member` s = go s xs
   | otherwise        = x : go (Set.insert x s) xs
  go _ _              = []

In fact, it has been proposed to add nubOrd to Data.Set . 实际上,已经建议nubOrd添加到Data.Set

import Data.Set (toList, fromList)
uniquify lst = toList $ fromList lst

I think that unique should return a list of elements that only appear once in the original list; 我认为unique应该返回仅在原始列表中出现一次的元素列表; that is, any elements of the orginal list that appear more than once should not be included in the result. 也就是说,原始列表中出现多次的任何元素都不应包含在结果中。

May I suggest an alternative definition, unique_alt: 我可以建议一个替代定义unique_alt:

    unique_alt :: [Int] -> [Int]
    unique_alt [] = []
    unique_alt (x:xs)
        | elem x ( unique_alt xs ) = [ y | y <- ( unique_alt xs ), y /= x ]
        | otherwise                = x : ( unique_alt xs )

Here are some examples that highlight the differences between unique_alt and unqiue: 以下是一些示例,这些示例突出了unique_alt和unqiue之间的区别:

    unique     [1,2,1]          = [2,1]
    unique_alt [1,2,1]          = [2]

    unique     [1,2,1,2]        = [1,2]
    unique_alt [1,2,1,2]        = []

    unique     [4,2,1,3,2,3]    = [4,1,2,3]
    unique_alt [4,2,1,3,2,3]    = [4,1]

I think this would do it. 我认为这可以做到。

unique [] = []
unique (x:xs) = x:unique (filter ((/=) x) xs)

Another way to remove duplicates: 删除重复项的另一种方法:

unique :: [Int] -> [Int]
unique xs = [x | (x,y) <- zip xs [0..], x `notElem` (take y xs)]

Algorithm in Haskell to create a unique list: Haskell中创建唯一列表的算法:

data Foo = Foo { id_ :: Int
               , name_ :: String
               } deriving (Show)

alldata = [ Foo 1 "Name"
          , Foo 2 "Name"
          , Foo 3 "Karl"
          , Foo 4 "Karl"
          , Foo 5 "Karl"
          , Foo 7 "Tim"
          , Foo 8 "Tim"
          , Foo 9 "Gaby"
          , Foo 9 "Name"
          ]

isolate :: [Foo] -> [Foo]
isolate [] = []
isolate (x:xs) = (fst f) : isolate (snd f)
  where
    f = foldl helper (x,[]) xs
    helper (a,b) y = if name_ x == name_ y
                     then if id_ x >= id_ y
                          then (x,b)
                          else (y,b)
                     else (a,y:b)

main :: IO ()
main = mapM_ (putStrLn . show) (isolate alldata)

Output: 输出:

Foo {id_ = 9, name_ = "Name"}
Foo {id_ = 9, name_ = "Gaby"}
Foo {id_ = 5, name_ = "Karl"}
Foo {id_ = 8, name_ = "Tim"}

A library-based solution:基于库的解决方案:

We can use that style of Haskell programming where all looping and recursion activities are pushed out of user code and into suitable library functions.我们可以使用 Haskell 编程风格,其中所有循环和递归活动都被推出用户代码并进入合适的库函数。 Said library functions are often optimized in ways that are way beyond the skills of a Haskell beginner.所述库函数通常以超出 Haskell 初学者技能的方式进行优化。

A way to decompose the problem into two passes goes like this:将问题分解为两遍的方法如下:

  1. produce a second list that is parallel to the input list, but with duplicate elements suitably marked生成与输入列表平行的第二个列表,但具有适当标记的重复元素
  2. eliminate elements marked as duplicates from that second list从第二个列表中删除标记为重复的元素

For the first step, duplicate elements don't need a value at all, so we can use [Maybe a] as the type of the second list.对于第一步,重复元素根本不需要值,所以我们可以使用[Maybe a]作为第二个列表的类型。 So we need a function of type:所以我们需要一个 function 类型:

pass1 :: Eq a => [a] -> [Maybe a]

Function pass1 is an example of stateful list traversal where the state is the list (or set) of distinct elements seen so far. Function pass1有状态列表遍历的示例,其中state是到目前为止看到的不同元素的列表(或集合)。 For this sort of problem, the library provides the mapAccumL:: (s -> a -> (s, b)) -> s -> [a] -> (s, [b]) function.对于这类问题,库提供了mapAccumL:: (s -> a -> (s, b)) -> s -> [a] -> (s, [b]) function。

Here the mapAccumL function requires, besides the initial state and the input list, a step function argument, of type s -> a -> (s, Maybe a) .这里的mapAccumL function 除了初始的 state 和输入列表之外,还需要一个类型为s -> a -> (s, Maybe a)步骤 function参数。

If the current element x is not a duplicate, the output of the step function is Just x and x gets added to the current state. If x is a duplicate, the output of the step function is Nothing , and the state is passed unchanged.如果当前元素x不是重复项,则步骤 function 的 output 是Just x并且x被添加到当前 state。如果 x 是重复项,则步骤 function 的 output 是Nothing ,并且 8827141483740 未更改。

Testing under the ghci interpreter:ghci解释器下测试:

$ ghci
 GHCi, version 8.8.4: https://www.haskell.org/ghc/  :? for help
 λ> 
 λ> stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
 λ> 
 λ> import Data.List(mapAccumL)
 λ> 
 λ> pass1 xs = mapAccumL stepFn [] xs
 λ> 
 λ> xs2 = snd $ pass1 "abacrba"
 λ> xs2
 [Just 'a', Just 'b', Nothing, Just 'c', Just 'r', Nothing, Nothing]
 λ> 

Writing a pass2 function is even easier.写一个pass2 function 就更容易了。 To filter out Nothing non-values, we could use:要过滤掉Nothing非值,我们可以使用:

import Data.Maybe( fromJust, isJust)
pass2 = (map fromJust) . (filter isJust)

but why bother at all?但为什么要打扰呢? - as this is precisely what the catMaybes library function does. - 因为这正是catMaybes库 function 所做的。

 λ> 
 λ> import Data.Maybe(catMaybes)
 λ> 
 λ> catMaybes xs2
 "abcr"
 λ> 

Putting it all together:把它们放在一起:

Overall, the source code can be written as:总的来说,源码可以写成:

import Data.Maybe(catMaybes)
import Data.List(mapAccumL)

uniques :: (Eq a) => [a] -> [a]
uniques = let  stepFn s x = if (elem x s) then (s, Nothing) else (x:s, Just x)
          in   catMaybes . snd . mapAccumL stepFn []

This code is reasonably compatible with infinite lists, something occasionally referred to as being “laziness-friendly”:这段代码与无限列表相当兼容,有时被称为“惰性友好”:

 λ> 
 λ> take 5 $ uniques $ "abacrba" ++ (cycle "abcrf")
 "abcrf"
 λ> 

Efficiency note: If we anticipate that it is possible to find many distinct elements in the input list and we can have an Ord a instance, the state can be implemented as a Set object rather than a plain list, this without having to alter the overall structure of the solution.效率说明:如果我们预计可以在输入列表中找到许多不同的元素并且我们可以有一个Ord a实例,则state可以实现为一个Set object 而不是一个普通列表,这无需改变整体解决方案的结构。

Here's a solution that uses only Prelude functions:这是一个仅使用 Prelude 函数的解决方案:

uniqueList theList =
if not (null theList)
    then head theList : filter (/= head theList) (uniqueList (tail theList))
    else []

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM