简体   繁体   中英

Checking if all elements in a list are unique

What is the best way (best as in the conventional way) of checking whether all elements in a list are unique?

My current approach using a Counter is:

>>> x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
>>> counter = Counter(x)
>>> for values in counter.itervalues():
        if values > 1: 
            # do something

Can I do better?

Not the most efficient, but straight forward and concise:

if len(x) > len(set(x)):
   pass # do something

Probably won't make much of a difference for short lists.

Here is a two-liner that will also do early exit:

>>> def allUnique(x):
...     seen = set()
...     return not any(i in seen or seen.add(i) for i in x)
...
>>> allUnique("ABCDEF")
True
>>> allUnique("ABACDEF")
False

If the elements of x aren't hashable, then you'll have to resort to using a list for seen :

>>> def allUnique(x):
...     seen = list()
...     return not any(i in seen or seen.append(i) for i in x)
...
>>> allUnique([list("ABC"), list("DEF")])
True
>>> allUnique([list("ABC"), list("DEF"), list("ABC")])
False

An early-exit solution could be

def unique_values(g):
    s = set()
    for x in g:
        if x in s: return False
        s.add(x)
    return True

however for small cases or if early-exiting is not the common case then I would expect len(x) != len(set(x)) being the fastest method.

for speed:

import numpy as np
x = [1, 1, 1, 2, 3, 4, 5, 6, 2]
np.unique(x).size == len(x)

如何将所有条目添加到集合并检查其长度?

len(set(x)) == len(x)

替代set ,您可以使用dict

len({}.fromkeys(x)) == len(x)

Another approach entirely, using sorted and groupby:

from itertools import groupby
is_unique = lambda seq: all(sum(1 for _ in x[1])==1 for x in groupby(sorted(seq)))

It requires a sort, but exits on the first repeated value.

Here is a recursive O(N 2 ) version for fun:

def is_unique(lst):
    if len(lst) > 1:
        return is_unique(s[1:]) and (s[0] not in s[1:])
    return True

Here is a recursive early-exit function:

def distinct(L):
    if len(L) == 2:
        return L[0] != L[1]
    H = L[0]
    T = L[1:]
    if (H in T):
            return False
    else:
            return distinct(T)    

It's fast enough for me without using weird(slow) conversions while having a functional-style approach.

How about this

def is_unique(lst):
    if not lst:
        return True
    else:
        return Counter(lst).most_common(1)[0][1]==1

All answer above are good but I prefer to use all_unique example from 30 seconds of python

You need to use set() on the given list to remove duplicates, compare its length with the length of the list.

def all_unique(lst):
  return len(lst) == len(set(lst))

It returns True if all the values in a flat list are unique , False otherwise.

x = [1, 2, 3, 4, 5, 6]
y = [1, 2, 2, 3, 4, 5]
all_unique(x)  # True
all_unique(y)  # False

If and only if you have the data processing library pandas in your dependencies, there's an already implemented solution which gives the boolean you want :

import pandas as pd
pd.Series(lst).is_unique

Using a similar approach in a Pandas dataframe to test if the contents of a column contains unique values:

if tempDF['var1'].size == tempDF['var1'].unique().size:
    print("Unique")
else:
    print("Not unique")

For me, this is instantaneous on an int variable in a dateframe containing over a million rows.

You can use Yan's syntax (len(x) > len(set(x))), but instead of set(x), define a function:

 def f5(seq, idfun=None): 
    # order preserving
    if idfun is None:
        def idfun(x): return x
    seen = {}
    result = []
    for item in seq:
        marker = idfun(item)
        # in old Python versions:
        # if seen.has_key(marker)
        # but in new ones:
        if marker in seen: continue
        seen[marker] = 1
        result.append(item)
    return result

and do len(x) > len(f5(x)). This will be fast and is also order preserving.

Code there is taken from: http://www.peterbe.com/plog/uniqifiers-benchmark

It does not fully fit the question but if you google the task I had you get this question ranked first and it might be of interest to the users as it is an extension of the quesiton. If you want to investigate for each list element if it is unique or not you can do the following:

import timeit
import numpy as np

def get_unique(mylist):
    # sort the list and keep the index
    sort = sorted((e,i) for i,e in enumerate(mylist))
    # check for each element if it is similar to the previous or next one    
    isunique = [[sort[0][1],sort[0][0]!=sort[1][0]]] + \
               [[s[1], (s[0]!=sort[i-1][0])and(s[0]!=sort[i+1][0])] 
                for [i,s] in enumerate (sort) if (i>0) and (i<len(sort)-1) ] +\
               [[sort[-1][1],sort[-1][0]!=sort[-2][0]]]     
    # sort indices and booleans and return only the boolean
    return [a[1] for a in sorted(isunique)]


def get_unique_using_count(mylist):
     return [mylist.count(item)==1 for item in mylist]

mylist = list(np.random.randint(0,10,10))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

mylist = list(np.random.randint(0,1000,1000))
%timeit for x in range(10): get_unique(mylist)
%timeit for x in range(10): get_unique_using_count(mylist)

for short lists the get_unique_using_count as suggested in some answers is fast. But if your list is already longer than 100 elements the count function takes quite long. Thus the approach shown in the get_unique function is much faster although it looks more complicated.

If the list is sorted anyway, you can use:

not any(sorted_list[i] == sorted_list[i + 1] for i in range(len(sorted_list) - 1))

Pretty efficient, but not worth sorting for this purpose though.

I've compared the suggested solutions with perfplot and found that

len(lst) == len(set(lst))

is indeed the fastest solution. If there are early duplicates in the list, there are some constant-time solutions which are to be preferred.

在此处输入图像描述

在此处输入图像描述


Code to reproduce the plot:

import perfplot
import numpy as np
import pandas as pd


def len_set(lst):
    return len(lst) == len(set(lst))


def set_add(lst):
    seen = set()
    return not any(i in seen or seen.add(i) for i in lst)


def list_append(lst):
    seen = list()
    return not any(i in seen or seen.append(i) for i in lst)


def numpy_unique(lst):
    return np.unique(lst).size == len(lst)


def set_add_early_exit(lst):
    s = set()
    for item in lst:
        if item in s:
            return False
        s.add(item)
    return True


def pandas_is_unique(lst):
    return pd.Series(lst).is_unique


def sort_diff(lst):
    return not np.any(np.diff(np.sort(lst)) == 0)


b = perfplot.bench(
    setup=lambda n: list(np.arange(n)),
    title="All items unique",
    # setup=lambda n: [0] * n,
    # title="All items equal",
    kernels=[
        len_set,
        set_add,
        list_append,
        numpy_unique,
        set_add_early_exit,
        pandas_is_unique,
        sort_diff,
    ],
    n_range=[2**k for k in range(18)],
    xlabel="len(lst)",
)

b.save("out.png")
b.show()

For begginers:

def AllDifferent(s):
    for i in range(len(s)):
        for i2 in range(len(s)):
            if i != i2:
                if s[i] == s[i2]:
                    return False
    return True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM