简体   繁体   English

哪种序列类型更适合比较,为什么? (Python)

[英]Which sequence type is better for a comparison and why? (Python)

I have a condition that compares one object to several others, like so:我有一个条件,将一个 object 与其他几个进行比较,如下所示:

if 'a' in ('a','b','c','e'):

The sequence was created for this purpose and doesn't exist anywhere else in the function.该序列是为此目的而创建的,在 function 的其他任何地方都不存在。 What are the pros and cons to grouping it as a tuple, list, or set, given that they all seem to work the same and the list is short?考虑到它们似乎都工作相同并且列表很短,将其分组为元组、列表或集合的优点和缺点是什么? Which would be idiomatic?哪个是惯用的?

Use a set until you have good reason not to.使用一套,直到你有充分的理由不这样做。 (And then use a list.) (然后使用列表。)

I would consider a set to be more idiomatic.我会认为一组更惯用。 It conveys the meaning more clearly, since order doesn't matter, only membership.它更清楚地传达了含义,因为顺序无关紧要,只有成员资格。

And to be clear, a set is a collection but not a "sequence type" (even though it's iterable), because it's semantically "unordered".需要明确的是,集合是一个集合,但不是“序列类型”(即使它是可迭代的),因为它在语义上是“无序的”。


Why not use a set?为什么不使用一套?

Sets may only contain hashable types.集合只能包含可散列类型。 And, this is important, they will raise a TypeError instead of simply returning False when you ask if an unhashable type is in the set.而且,这很重要,当您询问集合中是否存在不可散列的类型时,它们会引发TypeError而不是简单地返回False If you might get an unhashable object on either side of the in operator, you're out of luck.如果您可能在in运算符的任一侧得到不可散列的 object,那么您就不走运了。 Sometimes you can use hashable elements instead (like frozenset instead of set or tuple instead of list ), sometimes you can't.有时您可以使用可散列元素(例如frozenset代替settuple代替list ),有时则不能。

But tuples and lists don't have to hash their elements.但是元组和列表不必 hash 它们的元素。


Why a list over a tuple?为什么要在元组上列出列表?

The main advantage of a list that they avoid a syntactic quirk for tuples of one element.列表的主要优点是它们避免了一个元素的元组的语法怪癖。 Say you have ('foo', 'bar') and later decide to remove the 'bar' .假设您有('foo', 'bar') ,后来决定删除'bar' Then you have ('foo') .然后你有('foo') Oops, see what I did there?哎呀,看看我在那里做了什么? It was actually supposed to be ('foo',) .它实际上应该是('foo',) It's easy to forget the comma.很容易忘记逗号。 And the in check still works for strings like ('foo') , since in checks for substrings.并且in检查仍然适用于('foo')这样的字符串,因为in检查子字符串。 This can subtly change the meaning of your program.这可以巧妙地改变程序的含义。 'oo' is in ('foo') , but not in ('foo',) . 'oo'('foo')中,但不在('foo',)中。

A one-item list like ['foo'] doesn't have that problem.['foo']这样的单项列表没有这个问题。 [And as user2357112 pointed out, a constant list is going to get compiled to a tuple anyway.] [正如 user2357112 指出的那样,一个常量列表无论如何都会被编译成一个元组。]

Note that a one-item set, like {'a'} doesn't have that problem either.请注意,像{'a'}这样的单项集也没有这个问题。 An empty {} is a dict instead, but that's not going to cause any issues with an in check because it's also an empty collection.一个空的{}是一个 dict,但这不会导致in检查出现任何问题,因为它也是一个空集合。

But you should arguably be using == instead of in when comparing against only one element.但是当只与一个元素进行比较时,可以说你应该使用==而不是in


That's it for clarity.就是为了清楚起见。 Now for the micro-optimizations.现在进行微优化。 Early optimization is the root of all evil.早期优化是万恶之源。 Don't optimize at the expense of readability before it's actually necessary.在实际需要之前,不要以牺牲可读性为代价进行优化。

A set lookup is faster if it's not too small, since a tuple's elements have to be checked one-by-one which (on average) grows with the size of the tuple, while a set is backed by a hashtable (like a dict), which has a small constant overhead.如果不是太小,集合查找会更快,因为必须逐个检查元组的元素,这(平均)随着元组的大小而增长,而集合由哈希表(如字典)支持,它的开销很小。 If the distribution of cases isn't uniform, this means that the order of elements in the tuple matters a lot.如果案例的分布不均匀,这意味着元组中元素的顺序很重要。 Putting the more common cases first in the tuple will make the checks much faster than the reverse, on average.平均而言,将更常见的情况放在元组中会使检查比相反的情况快得多。

How small does the collection have to be to for the set's constant overhead to matter?集合必须有多小才能使集合的持续开销重要? Profile and see.配置文件并查看。 Performance can vary based on a lot of factors.性能可能会因许多因素而异。 It's not just the number of elements, but how long an equality check takes, and where they're located in memory, etc.这不仅仅是元素的数量,而是相等检查需要多长时间,以及它们在 memory 中的位置等。

A tuple should have a slightly smaller overhead both in memory and construction time than the other collections.一个元组在 memory 和构造时间上的开销都应该比另一个 collections 的开销略小。 But the construction overhead doesn't really matter if the compiler can make it load as a saved constant value.但是,如果编译器可以将其加载为保存的常量值,则构造开销并不重要。 (This can happen when all the elements are themselves constant at compile time. You can use the dis module to confirm this is happening.) (当所有元素本身在编译时都保持不变时,就会发生这种情况。您可以使用dis模块来确认这种情况正在发生。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM