简体   繁体   English

获得有序的独特项目列表的最佳/最pythonic方式

[英]Best / most pythonic way to get an ordered list of unique items

I have one or more unordered sequences of (immutable, hashable) objects with possible duplicates and I want to get a sorted sequence of all those objects without duplicates. 我有一个或多个(不可变的,可散列的)对象的无序序列,可能有重复项,我想得到所有这些对象的排序序列,没有重复。

Right now I'm using a set to quickly gather all the elements discarding duplicates, convert it to a list and then sort that: 现在我正在使用一个集合来快速收集所有丢弃重复项的元素,将其转换为列表然后对其进行排序:

result = set()
for s in sequences:
    result = result.union(s)
result = list(result)
result.sort()
return result

It works but I wouldn't call it "pretty". 它有效,但我不会称之为“漂亮”。 Is there a better way? 有没有更好的办法?

这应该工作:

sorted(set(itertools.chain.from_iterable(sequences)))

I like your code just fine. 我很喜欢你的代码。 It is straightforward and easy to understand. 它简单易懂。

We can shorten it just a little bit by chaining off the list() : 我们可以通过链接list()来缩短它一点点:

result = set()
for s in sequences:
    result = result.union(s)
return sorted(result)

I really have no desire to try to boil it down beyond that, but you could do it with reduce() : 我真的不想尝试将其烧掉,但你可以用reduce()来做到这一点:

result = reduce(lambda s, x: s.union(x), sequences, set())
return sorted(result)

Personally, I think this is harder to understand than the above, but people steeped in functional programming might prefer it. 就个人而言,我认为这比上面的内容更难理解,但沉浸在函数式编程中的人可能更喜欢它。

EDIT: @agf is much better at this reduce() stuff than I am. 编辑:@agf在这个reduce()要比我好多了。 From the comments below: 从下面的评论:

return sorted(reduce(set().union, sequences))

I had no idea this would work. 我不知道这会起作用。 If I correctly understand how this works, we are giving reduce() a callable which is really a method function on one instance of a set() (call it x for the sake of discussion, but note that I am not saying that Python will bind the name x with this object). 如果我正确理解它是如何工作的,我们给reduce()一个callable,它实际上是一个set()一个实例上的方法函数(为了讨论起见,将其称为x ,但请注意,我并不是说Python会将名称x与此对象绑定)。 Then reduce() will feed this function the first two iterables from sequences , returning x , the instance whose method function we are using. 然后reduce()将从sequences的前两个迭代中提供此函数,返回x ,即我们正在使用其方法函数的实例。 Then reduce() will repeatedly call the .union() method and ask it to take the union of x and the next iterable from sequences . 然后reduce()将重复调用.union()方法,并要求它从sequences中获取x和下一个iterable的并集。 Since the .union() method is likely smart enough to notice that it is being asked to take the union with its own instance and not bother to do any work, it should be just as fast to call x.union(x, some_iterable) as to just call x.union(some_iterable) . 由于.union()方法可能足够聪明,可以注意到它被要求使用自己的实例获取联合而不需要做任何工作,因此调用x.union(x, some_iterable)速度应该同样快。至于只调用x.union(some_iterable) Finally, reduce() will return x , and we have the set we want. 最后, reduce()将返回x ,我们有我们想要的集合。

This is a bit tricky for my personal taste. 这对我的个人品味来说有点棘手。 I had to think this through to understand it, while the itertools.chain() solution made sense to me right away. 我不得不考虑这一点来理解它,而itertools.chain()解决方案立即对我有意义。

EDIT: @agf made it less tricky: 编辑:@agf使它不那么棘手:

return sorted(reduce(set.union, sequences, set()))

What this is doing is much simpler to understand! 这样做要简单易懂! If we call the instance returned by set() by the name of x again (and just like above with the understanding that I am not claiming that Python will bind the name x with this instance); 如果我们再次通过名称x调用set()返回的实例(就像上面一样,理解我并不是声称Python会将名称x与此实例绑定); and if we use the name n to refer to each "next" value from sequences ; 如果我们使用名称n来引用sequences每个“下一个”值; then reduce() will be repeatedly calling set.union(x, n) . 然后reduce()将重复调用set.union(x, n) And of course this is exactly the same thing as x.union(n) . 当然,这与x.union(n)完全相同。 IMHO if you want a reduce() solution, this is the best one. 恕我直言,如果你想要一个reduce()解决方案,这是最好的解决方案。

-- -

If you want it to be fast, ask yourself: is there any way we can apply itertools to this? 如果你想要快速,请问自己:我们有什么方法可以将itertools应用于此吗? There is a pretty good way: 有一个很好的方法:

from itertools import chain
return sorted(set(chain(*sequences)))

itertools.chain() called with *sequences serves to "flatten" the list of lists into a single iterable. *sequences调用的itertools.chain()用于将列表列表“展平”为单个可迭代。 It's a little bit tricky, but only a little bit, and it's a common idiom. 这有点棘手,但只是一点点,这是一个常见的习语。

EDIT: As @Jbernardo wrote in the most popular answer, and as @agf observes in comments, itertools.chain() returns an object that has a .from_iterable() method, and the documentation says it evaluates an iterable lazily. 编辑:正如@Jbernardo在最流行的答案中所写,并且正如@agf在注释中所观察到的, itertools.chain()返回一个具有.from_iterable()方法的对象,文档说它会懒惰地评估迭代。 The * notation forces building a list, which may consume considerable memory if the iterable is a long sequence. *表示法强制构建一个列表,如果可迭代是一个长序列,它可能会占用大量内存。 In fact, you could have a never-ending generator, and with itertools.chain().from_iterable() you would be able to pull values from it for as long as you want to run your program, while the * notation would just run out of memory. 实际上,你可以拥有一个永无止境的生成器,并且使用itertools.chain().from_iterable() ,只要你想运行你的程序就可以从中提取值,而*表示法只会运行记不清。

As @Jbernardo wrote: 正如@Jbernardo所写:

sorted(set(itertools.chain.from_iterable(sequences)))

This is the best answer, and I already upvoted it. 这是最好的答案,我已经对它进行了投票。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM