简体   繁体   English

检查列表中的所有元素是否唯一

[英]Checking if all elements in a list are unique

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?检查列表中所有元素是否唯一的最佳方法(最好是传统方法)是什么?

My current approach using a Counter is:我目前使用Counter的方法是:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?我可以做得更好吗?

Not the most efficient, but straight forward and concise:不是最有效的,但直截了当和简洁:

if len(x) > len(set(x)):
   pass # do something

Probably won't make much of a difference for short lists.可能不会对短名单产生太大影响。

Here is a two-liner that will also do early exit:这是一个两班轮,也将提前退出:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren't hashable, then you'll have to resort to using a list for seen :如果 x 的元素不可散列,那么您将不得不求助于使用列表来seen

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

An early-exit solution could be一个提前退出的解决方案可能是

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.但是对于小情况或者如果提前退出不是常见情况,那么我希望len(x) != len(set(x))是最快的方法。

for speed:速度:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

如何将所有条目添加到集合并检查其长度?

len(set(x)) == len(x)

替代set ,您可以使用dict

len({}.fromkeys(x)) == len(x)

Another approach entirely, using sorted and groupby:完全使用 sorted 和 groupby 的另一种方法:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.它需要排序,但在第一个重复值上退出。

Here is a recursive O(N 2 ) version for fun:这是一个有趣的递归 O(N 2 ) 版本:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

Here is a recursive early-exit function:这是一个递归的提前退出函数:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)    

It's fast enough for me without using weird(slow) conversions while having a functional-style approach.它对我来说已经足够快了,而无需使用怪异(缓慢)的转换,同时具有功能风格的方法。

How about this这个怎么样

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

All answer above are good but I prefer to use all_unique example from 30 seconds of python上面的所有答案都很好,但我更喜欢使用30 秒 python 中的all_unique示例

You need to use set() on the given list to remove duplicates, compare its length with the length of the list.您需要在给定列表上使用set()来删除重复项,将其长度与列表的长度进行比较。

def all_unique(lst):
  return len(lst) == len(set(lst))

It returns True if all the values in a flat list are unique , False otherwise.如果平面列表中的所有值都是unique ,则返回True ,否则返回False

x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x)  # True
all_unique(y)  # False

If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :当且仅当您的依赖项中有数据处理库 pandas 时,有一个已经实现的解决方案可以提供您想要的布尔值:

import pandas as pd
pd.Series(lst).is_unique

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:在 Pandas 数据框中使用类似的方法来测试列的内容是否包含唯一值:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.对我来说,这在包含超过一百万行的日期帧中的 int 变量上是瞬时的。

You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:您可以使用 Yan 的语法 (len(x) > len(set(x))),但不要定义 set(x),而是定义一个函数:

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)).并做 len(x) > len(f5(x))。 This will be fast and is also order preserving.这将很快,并且还保持秩序。

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark代码取自: http ://www.peterbe.com/plog/uniqifiers-benchmark

It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton.它并不完全适合这个问题,但是如果你用谷歌搜索我让你得到的这个问题排名第一的任务,它可能会引起用户的兴趣,因为它是问题的扩展。 If you want to investigate for each list element if it is unique or not you can do the following:如果要调查每个列表元素是否唯一,可以执行以下操作:

import timeit
import numpy as np

def get_unique(mylist):
    # sort the list and keep the index
    sort = sorted((e,i) for i,e in enumerate(mylist))
    # check for each element if it is similar to the previous or next one    
    isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
               [[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])] 
                for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
               [[sort[-1][1],sort[-1][0]!=sort[-2][0]]]     
    # sort indices and booleans and return only the boolean
    return [a[1] for a in sorted(isunique)]


def get_unique_using_count(mylist):
     return [mylist.count(item)==1 for item in mylist]

mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

for short lists the get_unique_using_count as suggested in some answers is fast.对于一些答案中建议的简短列表, get_unique_using_count很快。 But if your list is already longer than 100 elements the count function takes quite long.但是,如果您的列表已经超过 100 个元素,则 count 函数需要很长时间。 Thus the approach shown in the get_unique function is much faster although it looks more complicated.因此, get_unique函数中显示的方法要快得多,尽管它看起来更复杂。

If the list is sorted anyway, you can use:如果列表仍然排序,您可以使用:

not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))

Pretty efficient, but not worth sorting for this purpose though.非常有效,但不值得为此目的进行排序。

I've compared the suggested solutions with perfplot and found that我将建议的解决方案与perfplot进行了比较,发现

len(lst) == len(set(lst))

is indeed the fastest solution.确实是最快的解决方案。 If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.如果列表中有早期重复项,则首选一些恒定时间解决方案。

在此处输入图像描述

在此处输入图像描述


Code to reproduce the plot:重现 plot 的代码:

import perfplot
import numpy as np
import pandas as pd


def len_set(lst):
    return len(lst) == len(set(lst))


def set_add(lst):
    seen = set()
    return not any(i in seen or seen.add(i) for i in lst)


def list_append(lst):
    seen = list()
    return not any(i in seen or seen.append(i) for i in lst)


def numpy_unique(lst):
    return np.unique(lst).size == len(lst)


def set_add_early_exit(lst):
    s = set()
    for item in lst:
        if item in s:
            return False
        s.add(item)
    return True


def pandas_is_unique(lst):
    return pd.Series(lst).is_unique


def sort_diff(lst):
    return not np.any(np.diff(np.sort(lst)) == 0)


b = perfplot.bench(
    setup=lambda n: list(np.arange(n)),
    title="All items unique",
    # setup=lambda n: [0] * n,
    # title="All items equal",
    kernels=[
        len_set,
        set_add,
        list_append,
        numpy_unique,
        set_add_early_exit,
        pandas_is_unique,
        sort_diff,
    ],
    n_range=[2**k for k in range(18)],
    xlabel="len(lst)",
)

b.save("out.png")
b.show()

For begginers:对于初学者:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM