Consider a Python list my_list
containing ['foo', 'foo', 'bar']
.
What is the most Pythonic way to uniquify and sort a list ?
(think cat my_list | sort | uniq
)
This is how I currently do it and while it works I'm sure there are better ways to do it.
my_list = []
...
my_list.append("foo")
my_list.append("foo")
my_list.append("bar")
...
my_list = set(my_list)
my_list = list(my_list)
my_list.sort()
my_list = sorted(set(my_list))
# Python ≥ 2.4
# because of (generator expression) and itertools.groupby, sorted
import itertools
def sort_uniq(sequence):
return (x[0] for x in itertools.groupby(sorted(sequence)))
Faster:
import itertools, operator
import sys
if sys.hexversion < 0x03000000:
mapper= itertools.imap # 2.4 ≤ Python < 3
else:
mapper= map # Python ≥ 3
def sort_uniq(sequence):
return mapper(
operator.itemgetter(0),
itertools.groupby(sorted(sequence)))
Both versions return an generator, so you might want to supply the result to the list type:
sequence= list(sort_uniq(sequence))
Note that this will work with non-hashable items too:
>>> list(sort_uniq([[0],[1],[0]]))
[[0], [1]]
The straightforward solution is provided by Ignacio— sorted(set(foo))
.
If you have unique data, there's a reasonable chance you don't just want to do sorted(set(...))
but rather to store a set all the time and occasionally pull out a sorted version of the values. (At that point, it starts sounding like the sort of thing people often use a database for, too.)
If you have a sorted list and you want to check membership on logarithmic and add an item in worst case linear time, you can use the bisect
module .
If you want to keep this condition all the time and you want to simplify things or make some operations perform better, you might consider blist.sortedset
.
Others have mentioned sorted(set(my_list)), which works for hashable values such as strings, numbers and tuples, but not for unhashable types such as lists.
To get a sorted list of values of any sortable type, without duplicates:
from itertools import izip, islice
def unique_sorted(values):
"Return a sorted list of the given values, without duplicates."
values = sorted(values)
if not values:
return []
consecutive_pairs = izip(values, islice(values, 1, len(values)))
result = [a for (a, b) in consecutive_pairs if a != b]
result.append(values[-1])
return result
This can be further simplified using the "pairwise" or "unique_justseen" recipes from the itertools documentation .
不能说这是干净的方式,但只是为了好玩:
my_list = [x for x in sorted(my_list) if not x in locals()["_[1]"]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.