Which sequence type is better for a comparison and why? (Python)

Question

I have a condition that compares one object to several others, like so:

if 'a' in ('a','b','c','e'):

The sequence was created for this purpose and doesn't exist anywhere else in the function. What are the pros and cons to grouping it as a tuple, list, or set, given that they all seem to work the same and the list is short? Which would be idiomatic?

Answer 1

Use a set until you have good reason not to. (And then use a list.)

I would consider a set to be more idiomatic. It conveys the meaning more clearly, since order doesn't matter, only membership.

And to be clear, a set is a collection but not a "sequence type" (even though it's iterable), because it's semantically "unordered".

Why not use a set?

Sets may only contain hashable types. And, this is important, they will raise a TypeError instead of simply returning False when you ask if an unhashable type is in the set. If you might get an unhashable object on either side of the in operator, you're out of luck. Sometimes you can use hashable elements instead (like frozenset instead of set or tuple instead of list ), sometimes you can't.

But tuples and lists don't have to hash their elements.

Why a list over a tuple?

The main advantage of a list that they avoid a syntactic quirk for tuples of one element. Say you have ('foo', 'bar') and later decide to remove the 'bar' . Then you have ('foo') . Oops, see what I did there? It was actually supposed to be ('foo',) . It's easy to forget the comma. And the in check still works for strings like ('foo') , since in checks for substrings. This can subtly change the meaning of your program. 'oo' is in ('foo') , but not in ('foo',) .

A one-item list like ['foo'] doesn't have that problem. [And as user2357112 pointed out, a constant list is going to get compiled to a tuple anyway.]

Note that a one-item set, like {'a'} doesn't have that problem either. An empty {} is a dict instead, but that's not going to cause any issues with an in check because it's also an empty collection.

But you should arguably be using == instead of in when comparing against only one element.

That's it for clarity. Now for the micro-optimizations. Early optimization is the root of all evil. Don't optimize at the expense of readability before it's actually necessary.

A set lookup is faster if it's not too small, since a tuple's elements have to be checked one-by-one which (on average) grows with the size of the tuple, while a set is backed by a hashtable (like a dict), which has a small constant overhead. If the distribution of cases isn't uniform, this means that the order of elements in the tuple matters a lot. Putting the more common cases first in the tuple will make the checks much faster than the reverse, on average.

How small does the collection have to be to for the set's constant overhead to matter? Profile and see. Performance can vary based on a lot of factors. It's not just the number of elements, but how long an equality check takes, and where they're located in memory, etc.

A tuple should have a slightly smaller overhead both in memory and construction time than the other collections. But the construction overhead doesn't really matter if the compiler can make it load as a saved constant value. (This can happen when all the elements are themselves constant at compile time. You can use the dis module to confirm this is happening.)

Which sequence type is better for a comparison and why? (Python)

Question

1 answers

solution1
1 ACCPTED 2019-10-05 02:35:23

Which sequence type is better for a comparison and why? (Python)

Question

1 answers

solution1 1 ACCPTED 2019-10-05 02:35:23

solution1
1 ACCPTED 2019-10-05 02:35:23