检查列表中的所有元素是否唯一

Question

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?检查列表中所有元素是否唯一的最佳方法（最好是传统方法）是什么？

My current approach using a Counter is:我目前使用Counter的方法是：

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?我可以做得更好吗？

Answer 1

Not the most efficient, but straight forward and concise:不是最有效的，但直截了当和简洁：

if len(x) > len(set(x)):
   pass # do something

Probably won't make much of a difference for short lists.可能不会对短名单产生太大影响。

Answer 2

Here is a two-liner that will also do early exit:这是一个两班轮，也将提前退出：

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren't hashable, then you'll have to resort to using a list for seen :如果 x 的元素不可散列，那么您将不得不求助于使用列表来seen ：

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

Answer 3

An early-exit solution could be一个提前退出的解决方案可能是

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.但是对于小情况或者如果提前退出不是常见情况，那么我希望len(x) != len(set(x))是最快的方法。

Answer 4

for speed:速度：

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

Answer 5

如何将所有条目添加到集合并检查其长度？

len(set(x)) == len(x)

Answer 6

替代set ，您可以使用dict 。

len({}.fromkeys(x)) == len(x)

Answer 7

Another approach entirely, using sorted and groupby:完全使用 sorted 和 groupby 的另一种方法：

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.它需要排序，但在第一个重复值上退出。

Answer 8

Here is a recursive O(N ² ) version for fun:这是一个有趣的递归 O(N ² ) 版本：

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

Answer 9

Here is a recursive early-exit function:这是一个递归的提前退出函数：

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)

It's fast enough for me without using weird(slow) conversions while having a functional-style approach.它对我来说已经足够快了，而无需使用怪异（缓慢）的转换，同时具有功能风格的方法。

Answer 10

How about this这个怎么样

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

Answer 11

All answer above are good but I prefer to use all_unique example from 30 seconds of python上面的所有答案都很好，但我更喜欢使用30 秒 python 中的all_unique示例

You need to use set() on the given list to remove duplicates, compare its length with the length of the list.您需要在给定列表上使用set()来删除重复项，将其长度与列表的长度进行比较。

def all_unique(lst):
  return len(lst) == len(set(lst))

It returns True if all the values in a flat list are unique , False otherwise.如果平面列表中的所有值都是unique ，则返回True ，否则返回False 。

x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x)  # True
all_unique(y)  # False

Answer 12

If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :当且仅当您的依赖项中有数据处理库 pandas 时，有一个已经实现的解决方案可以提供您想要的布尔值：

import pandas as pd
pd.Series(lst).is_unique

Answer 13

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:在 Pandas 数据框中使用类似的方法来测试列的内容是否包含唯一值：

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.对我来说，这在包含超过一百万行的日期帧中的 int 变量上是瞬时的。

Answer 14

You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:您可以使用 Yan 的语法 (len(x) > len(set(x)))，但不要定义 set(x)，而是定义一个函数：

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)).并做 len(x) > len(f5(x))。 This will be fast and is also order preserving.这将很快，并且还保持秩序。

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark代码取自： http ://www.peterbe.com/plog/uniqifiers-benchmark

Answer 15

It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton.它并不完全适合这个问题，但是如果你用谷歌搜索我让你得到的这个问题排名第一的任务，它可能会引起用户的兴趣，因为它是问题的扩展。 If you want to investigate for each list element if it is unique or not you can do the following:如果要调查每个列表元素是否唯一，可以执行以下操作：

import timeit
import numpy as np

def get_unique(mylist):
    # sort the list and keep the index
    sort = sorted((e,i) for i,e in enumerate(mylist))
    # check for each element if it is similar to the previous or next one    
    isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
               [[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])] 
                for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
               [[sort[-1][1],sort[-1][0]!=sort[-2][0]]]     
    # sort indices and booleans and return only the boolean
    return [a[1] for a in sorted(isunique)]


def get_unique_using_count(mylist):
     return [mylist.count(item)==1 for item in mylist]

mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

for short lists the get_unique_using_count as suggested in some answers is fast.对于一些答案中建议的简短列表， get_unique_using_count很快。 But if your list is already longer than 100 elements the count function takes quite long.但是，如果您的列表已经超过 100 个元素，则 count 函数需要很长时间。 Thus the approach shown in the get_unique function is much faster although it looks more complicated.因此， get_unique函数中显示的方法要快得多，尽管它看起来更复杂。

Answer 16

If the list is sorted anyway, you can use:如果列表仍然排序，您可以使用：

not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))

Pretty efficient, but not worth sorting for this purpose though.非常有效，但不值得为此目的进行排序。

Answer 17

I've compared the suggested solutions with perfplot and found that我将建议的解决方案与perfplot进行了比较，发现

len(lst) == len(set(lst))

is indeed the fastest solution.确实是最快的解决方案。 If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.如果列表中有早期重复项，则首选一些恒定时间解决方案。

Code to reproduce the plot:重现 plot 的代码：

import perfplot
import numpy as np
import pandas as pd


def len_set(lst):
    return len(lst) == len(set(lst))


def set_add(lst):
    seen = set()
    return not any(i in seen or seen.add(i) for i in lst)


def list_append(lst):
    seen = list()
    return not any(i in seen or seen.append(i) for i in lst)


def numpy_unique(lst):
    return np.unique(lst).size == len(lst)


def set_add_early_exit(lst):
    s = set()
    for item in lst:
        if item in s:
            return False
        s.add(item)
    return True


def pandas_is_unique(lst):
    return pd.Series(lst).is_unique


def sort_diff(lst):
    return not np.any(np.diff(np.sort(lst)) == 0)


b = perfplot.bench(
    setup=lambda n: list(np.arange(n)),
    title="All items unique",
    # setup=lambda n: [0] * n,
    # title="All items equal",
    kernels=[
        len_set,
        set_add,
        list_append,
        numpy_unique,
        set_add_early_exit,
        pandas_is_unique,
        sort_diff,
    ],
    n_range=[2**k for k in range(18)],
    xlabel="len(lst)",
)

b.save("out.png")
b.show()

Answer 18

For begginers:对于初学者：

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

检查列表中的所有元素是否唯一

问题描述

18 个解决方案

解决方案1
206 已采纳 2011-03-11 20:47:21

解决方案2
113 2011-03-12 09:12:25

解决方案3
21 2011-03-11 20:50:59

解决方案4
18 2012-11-29 20:29:42

解决方案5
13 2011-03-11 20:48:56

解决方案6
8 2011-03-11 20:50:55

解决方案7
3 2012-12-27 04:34:26

解决方案8
3 2014-12-14 05:51:03

解决方案9
2 2013-04-28 16:12:46

解决方案10
1 2012-11-08 09:03:02

解决方案11
1 2019-09-12 12:37:19

解决方案12
1 2022-03-18 16:59:45

解决方案13
0 2016-04-19 22:38:59

解决方案14
0 2011-03-11 20:51:09

解决方案15
0 2021-11-29 14:15:19

解决方案16
0 2022-02-25 15:57:37

解决方案17
0 2023-01-03 16:40:31

解决方案18
-3 2015-11-04 14:37:32

检查列表中的所有元素是否唯一

问题描述

18 个解决方案

解决方案1 206 已采纳 2011-03-11 20:47:21

解决方案2 113 2011-03-12 09:12:25

解决方案3 21 2011-03-11 20:50:59

解决方案4 18 2012-11-29 20:29:42

解决方案5 13 2011-03-11 20:48:56

解决方案6 8 2011-03-11 20:50:55

解决方案7 3 2012-12-27 04:34:26

解决方案8 3 2014-12-14 05:51:03

解决方案9 2 2013-04-28 16:12:46

解决方案10 1 2012-11-08 09:03:02

解决方案11 1 2019-09-12 12:37:19

解决方案12 1 2022-03-18 16:59:45

解决方案13 0 2016-04-19 22:38:59

解决方案14 0 2011-03-11 20:51:09

解决方案15 0 2021-11-29 14:15:19

解决方案16 0 2022-02-25 15:57:37

解决方案17 0 2023-01-03 16:40:31

解决方案18 -3 2015-11-04 14:37:32

解决方案1
206 已采纳 2011-03-11 20:47:21

解决方案2
113 2011-03-12 09:12:25

解决方案3
21 2011-03-11 20:50:59

解决方案4
18 2012-11-29 20:29:42

解决方案5
13 2011-03-11 20:48:56

解决方案6
8 2011-03-11 20:50:55

解决方案7
3 2012-12-27 04:34:26

解决方案8
3 2014-12-14 05:51:03

解决方案9
2 2013-04-28 16:12:46

解决方案10
1 2012-11-08 09:03:02

解决方案11
1 2019-09-12 12:37:19

解决方案12
1 2022-03-18 16:59:45

解决方案13
0 2016-04-19 22:38:59

解决方案14
0 2011-03-11 20:51:09

解决方案15
0 2021-11-29 14:15:19

解决方案16
0 2022-02-25 15:57:37

解决方案17
0 2023-01-03 16:40:31

解决方案18
-3 2015-11-04 14:37:32