[英]Checking if all elements in a list are unique
What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?检查列表中所有元素是否唯一的最佳方法(最好是传统方法)是什么?
My current approach using a Counter
is:我目前使用Counter
的方法是:
>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
if values > 1:
# do something
Can I do better?我可以做得更好吗?
Not the most efficient, but straight forward and concise:不是最有效的,但直截了当和简洁:
if len(x) > len(set(x)):
pass # do something
Probably won't make much of a difference for short lists.可能不会对短名单产生太大影响。
Here is a two-liner that will also do early exit:这是一个两班轮,也将提前退出:
>>> def allUnique(x):
... seen = set()
... return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False
If the elements of x aren't hashable, then you'll have to resort to using a list for seen
:如果 x 的元素不可散列,那么您将不得不求助于使用列表来seen
:
>>> def allUnique(x):
... seen = list()
... return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False
An early-exit solution could be一个提前退出的解决方案可能是
def unique_values(g):
s = set()
for x in g:
if x in s: return False
s.add(x)
return True
however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x))
being the fastest method.但是对于小情况或者如果提前退出不是常见情况,那么我希望len(x) != len(set(x))
是最快的方法。
for speed:速度:
import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)
如何将所有条目添加到集合并检查其长度?
len(set(x)) == len(x)
替代set
,您可以使用dict
。
len({}.fromkeys(x)) == len(x)
Another approach entirely, using sorted and groupby:完全使用 sorted 和 groupby 的另一种方法:
from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))
It requires a sort, but exits on the first repeated value.它需要排序,但在第一个重复值上退出。
Here is a recursive O(N 2 ) version for fun:这是一个有趣的递归 O(N 2 ) 版本:
def is_unique(lst):
if len(lst) > 1:
return is_unique(s[1:]) and (s[0] not in s[1:])
return True
Here is a recursive early-exit function:这是一个递归的提前退出函数:
def distinct(L):
if len(L) == 2:
return L[0] != L[1]
H = L[0]
T = L[1:]
if (H in T):
return False
else:
return distinct(T)
It's fast enough for me without using weird(slow) conversions while having a functional-style approach.它对我来说已经足够快了,而无需使用怪异(缓慢)的转换,同时具有功能风格的方法。
How about this这个怎么样
def is_unique(lst):
if not lst:
return True
else:
return Counter(lst).most_common(1)[0][1]==1
All answer above are good but I prefer to use all_unique
example from 30 seconds of python上面的所有答案都很好,但我更喜欢使用30 秒 python 中的all_unique
示例
You need to use set()
on the given list to remove duplicates, compare its length with the length of the list.您需要在给定列表上使用set()
来删除重复项,将其长度与列表的长度进行比较。
def all_unique(lst):
return len(lst) == len(set(lst))
It returns True
if all the values in a flat list are unique
, False
otherwise.如果平面列表中的所有值都是unique
,则返回True
,否则返回False
。
x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x) # True
all_unique(y) # False
If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :当且仅当您的依赖项中有数据处理库 pandas 时,有一个已经实现的解决方案可以提供您想要的布尔值:
import pandas as pd
pd.Series(lst).is_unique
Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:在 Pandas 数据框中使用类似的方法来测试列的内容是否包含唯一值:
if tempDF['var1'].size == tempDF['var1'].unique().size:
print("Unique")
else:
print("Not unique")
For me, this is instantaneous on an int variable in a dateframe containing over a million rows.对我来说,这在包含超过一百万行的日期帧中的 int 变量上是瞬时的。
You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:您可以使用 Yan 的语法 (len(x) > len(set(x))),但不要定义 set(x),而是定义一个函数:
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
and do len(x) > len(f5(x)).并做 len(x) > len(f5(x))。 This will be fast and is also order preserving.这将很快,并且还保持秩序。
Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark代码取自: http ://www.peterbe.com/plog/uniqifiers-benchmark
It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton.它并不完全适合这个问题,但是如果你用谷歌搜索我让你得到的这个问题排名第一的任务,它可能会引起用户的兴趣,因为它是问题的扩展。 If you want to investigate for each list element if it is unique or not you can do the following:如果要调查每个列表元素是否唯一,可以执行以下操作:
import timeit
import numpy as np
def get_unique(mylist):
# sort the list and keep the index
sort = sorted((e,i) for i,e in enumerate(mylist))
# check for each element if it is similar to the previous or next one
isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
[[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])]
for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
[[sort[-1][1],sort[-1][0]!=sort[-2][0]]]
# sort indices and booleans and return only the boolean
return [a[1] for a in sorted(isunique)]
def get_unique_using_count(mylist):
return [mylist.count(item)==1 for item in mylist]
mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)
mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)
for short lists the get_unique_using_count
as suggested in some answers is fast.对于一些答案中建议的简短列表, get_unique_using_count
很快。 But if your list is already longer than 100 elements the count function takes quite long.但是,如果您的列表已经超过 100 个元素,则 count 函数需要很长时间。 Thus the approach shown in the get_unique
function is much faster although it looks more complicated.因此, get_unique
函数中显示的方法要快得多,尽管它看起来更复杂。
If the list is sorted anyway, you can use:如果列表仍然排序,您可以使用:
not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))
Pretty efficient, but not worth sorting for this purpose though.非常有效,但不值得为此目的进行排序。
I've compared the suggested solutions with perfplot and found that我将建议的解决方案与perfplot进行了比较,发现
len(lst) == len(set(lst))
is indeed the fastest solution.确实是最快的解决方案。 If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.如果列表中有早期重复项,则首选一些恒定时间解决方案。
Code to reproduce the plot:重现 plot 的代码:
import perfplot
import numpy as np
import pandas as pd
def len_set(lst):
return len(lst) == len(set(lst))
def set_add(lst):
seen = set()
return not any(i in seen or seen.add(i) for i in lst)
def list_append(lst):
seen = list()
return not any(i in seen or seen.append(i) for i in lst)
def numpy_unique(lst):
return np.unique(lst).size == len(lst)
def set_add_early_exit(lst):
s = set()
for item in lst:
if item in s:
return False
s.add(item)
return True
def pandas_is_unique(lst):
return pd.Series(lst).is_unique
def sort_diff(lst):
return not np.any(np.diff(np.sort(lst)) == 0)
b = perfplot.bench(
setup=lambda n: list(np.arange(n)),
title="All items unique",
# setup=lambda n: [0] * n,
# title="All items equal",
kernels=[
len_set,
set_add,
list_append,
numpy_unique,
set_add_early_exit,
pandas_is_unique,
sort_diff,
],
n_range=[2**k for k in range(18)],
xlabel="len(lst)",
)
b.save("out.png")
b.show()
For begginers:对于初学者:
def AllDifferent(s):
for i in range(len(s)):
for i2 in range(len(s)):
if i != i2:
if s[i] == s[i2]:
return False
return True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.