简体   繁体   中英

Why does Python treat tuples, lists, sets and dictionaries as fundamentally different things?

One of the reasons I love Python is the expressive power / reduced programming effort provided by tuples, lists, sets and dictionaries. Once you understand list comprehensions and a few of the basic patterns using in and for , life gets so much better! Python rocks.

However I do wonder why these constructs are treated as differently as they are, and how this is changing (getting stranger) over time. Back in Python 2.x, I could've made an argument they were all just variations of a basic collection type, and that it was kind of irritating that some non-exotic use cases require you to convert a dictionary to a list and back again. (Isn't a dictionary just a list of tuples with a particular uniqueness constraint? Isn't a list just a set with a different kind of uniqueness constraint?).

Now in the 3.x world, it's gotten more complicated. There are now named tuples -- starting to feel more like a special-case dictionary. There are now ordered dictionaries -- starting to feel more like a list. And I just saw a recipe for ordered sets. I can picture this going on and on ... what about unique lists, etc.

The Zen of Python says "There should be one-- and preferably only one --obvious way to do it". It seems to me this profusion of specialized collections types is in conflict with this Python precept.

What do the hardcore Pythonistas think?

These data types all serve different purposes, and in an ideal world you might be able to unify them more. However, in the real world we need to have efficient implementations of the basic collections, and eg ordering adds a runtime penalty.

The named tuples mainly serve to make the interface of stat() and the like more usable, and also can be nice when dealing with SQL row sets.

The big unification you're looking for is actually there, in the form of the different access protocols (getitem, getattr, iter, ...), which these types mix and match for their intended purposes.

tl;dr (duck-typing)

You're correct to see some similarities in all these data structures. Remember that python uses duck-typing (if it looks like a duck and quacks like a duck then it is a duck). If you can use two objects in the same situation then, for your current intents and purposes, they might as well be the same data type. But you always have to keep in mind that if you try to use them in other situations, they may no longer behave the same way.

With this in mind we should take a look at what's actually different and the same about the four data types you mentioned, to get a general idea of the situations where they are interchangeable.

Mutability (can you change it?)

You can make changes to dictionaries, lists, and sets. Tuples cannot be "changed" without making a copy.

  • Mutable: dict , list , set

    Immutable: tuple

Python string is also an immutable type. Why do we want some immutable objects? I would paraphrase from this answer:

  1. Immutable objects can be optimized a lot

  2. In Python, only immutables are hashable (and only hashable objects can be members of sets, or keys in dictionaries).

Comparing across this property, lists and tuples seem like the "closest" two data types. At a high-level a tuple is an immutable "freeze-frame" version of a list. This makes lists useful for data sets that will be changing over time (since you don't have to copy a list to modify it) but tuples useful for things like dictionary keys (which must be immutable types).

Ordering (and a note on abstract data types)

A dictionary, like a set, has no inherent conceptual order to it. This is in contrast to lists and tuples, which do have an order. The order for the items in a dict or a set is abstracted away from the programmer, meaning that if element A comes before B in a for k in mydata loop, you shouldn't (and can't generally) rely on A being before B once you start making changes to mydata .

  • Order-preserving: list , tuple

    Non-order-preserving: dict , set

Technically if you iterate over mydata twice in a row it'll be in the same order, but this is more a convenient feature of the mechanics of python, and not really a part of the set abstract data type (the mathematical definition of the data type). Lists and tuples do guarantee order though, especially tuples which are immutable.

What you see when you iterate (if it walks like a duck...)

  • One "item" per "element": set , list , tuple

    Two "items" per "element": dict

I suppose here you could see a named tuple, which has both a name and a value for each element, as an immutable analogue of a dictionary. But this is a tenuous comparison- keep in mind that duck-typing will cause problems if you're trying to use a dictionary-only method on a named tuple, or vice-versa.

Direct responses to your questions

Isn't a dictionary just a list of tuples with a particular uniqueness constraint?

No, there are several differences. Dictionaries have no inherent order, which is different from a list, which does.

Also, a dictionary has a key and a value for each "element". A tuple, on the other hand, can have an arbitrary number of elements, but each with only a value.

Because of the mechanics of a dictionary, where keys act like a set, you can look up values in constant time if you have the key. In a list of tuples (pairs here), you would need to iterate through the list until you found the key, meaning search would be linear in the number of elements in your list.

Most importantly, though, dictionary items can be changed, while tuples cannot.

Isn't a list just a set with a different kind of uniqueness constraint?

Again, I'd stress that sets have no inherent ordering, while lists do. This makes lists much more useful for representing things like stacks and queues, where you want to be able to remember the order in which you appended items. Sets offer no such guarantee. However they do offer the advantage of being able to do membership lookups in constant time, while again lists take linear time.

There are now named tuples -- starting to feel more like a special-case dictionary. There are now ordered dictionaries -- starting to feel more like a list. And I just saw a recipe for ordered sets. I can picture this going on and on ... what about unique lists, etc.

To some degree I agree with you. However data structure libraries can be useful to support common use-cases for already well-established data structures. This keep the programmer from wasting time trying to come up with custom extensions to the standard structures. As long as it doesn't get out of hand, and we can still see the unique usefulness in each solution, it's good to have a wheel on the shelf so we don't need to reinvent it.

A great example is the Counter() class. This specialized dictionary has been of use to me more times than I can count (badoom-tshhhhh!) and it has saved me the effort of coding up a custom solution. I'd much rather have a solution that the community is helping me to develop and keep with proper python best-practices than something that sits around in my custom data structures folder and only gets used once or twice a year.

First of all, Ordered Dictionaries and Named Tuples were introduced in Python 2, but that's beside the point.

I won't point you at the docs since if you were really interested you would have read them already.

The first difference between collection types is mutability. tuple and frozenset are immutable types. This means they can be more efficient than list or set .

If you want something you can access randomly or in order, but will mainly change at the end, you want a list . If you want something you can also change at the beginning, you want a deque .

You simply can't have your cake and eat it too -- every feature you add causes you to lose some speed.

dict and set are fundamentally different from lists and tuples`. They store the hash of their keys, allowing you to see if an item is in them very quickly, but requires the key be hashable. You don't get the same membership testing speed with linked lists or arrays.

When you get to OrderedDict and NamedTuple , you're talking about subclasses of the builtin types implemented in Python, rather than in C. They are for special cases, just like any other code in the standard library you have to import . They don't clutter up the namespace but are nice to have when you need them.

One of these days, you'll be coding, and you'll say, "Man, now I know exactly what they meant by 'There should be one-- and preferably only one --obvious way to do it', a set is just what I needed for this, I'm so glad it's part of the Python language! If I had to use a list, it would take forever ." That's when you'll understand why these different types exist.

A dictionary is indexed by key (in fact, it's a hash map); a generic list of tuples won't be. You might argue that both should be implemented as relations, with the ability to add indices at will, but in practice having optimized types for the common use cases is both more convenient and more efficient.

New specialized collections get added because they are common enough that lots of people would end up implementing them using more basic data types, and then you'd have the usual problems with wheel reinvention (wasted effort, lack of interoperability...). And if Python just offered an entirely generic construct, then we'd get lots of people asking "how do I implement a set using a relation", etc.

(btw, I'm using relation in the mathematical or DB sense)

All of these specialized collection types provide specific functionalities that are not adequately or efficiently provided by the "standard" data types of list, tuple, dict, and set.

For example, sometimes you need a collection of unique items, and you also need to retain the order in which you encountered them. You can do this using a set to keep track of membership and a list to keep track of order, but your solution will probably be slower and more memory-hungry than a specialized data structure designed for exactly this purpose, such as an ordered set.

These additional data types, which you see as combinations or variations on the basic ones, actually fill gaps in functionality left by the basic data types. From a practical perspective, if Python's core or standard library did not provide these data types, then anyone who needed them would invent their own inefficient versions. They are used less often than the basic types, but often enough to make it worth while to provide standard implementations.

One of the things I like in Python the most is agility. And a lot of functional, effective and usable collections types gives it to me.

And there is still one way to do this - each type does its own job.

The world of data structures (language agnostic) can generally be boiled down to a few small basic structures - lists, trees, hash-tables and graphs, etc. and variants and combinations thereof. Each has its own specific purpose in terms of use and implementation.

I don't think that you can do things like reduce a dictionary to a list of tuples with a particular uniqueness constraint without actually specifying a dictionary. A dictionary has a specific purpose - key/value look-ups - and the implementation of the data structure is generally tailored to those needs. Sets are like dictionaries in many ways, but certain operations on sets don't make sense on a dictionary (union, disjunction, etc).

I don't see this violating the 'Zen of Python' of doing things one way. While you can use a sorted dictionary to do what a dictionary does without using the sorted part, you're more violating Occam's razor and likely causing a performance penalty. I see this as different than being able to syntactically do thing different ways a la Perl.

The Zen of Python says "There should be one-- and preferably only one --obvious way to do it". It seems to me this profusion of specialized collections types is in conflict with this Python precept.

Not remotely. There are several different things being done here. We choose the right tool for the job. All of these containers are modeled on decades-old tried, tested and true CS concepts.

Dictionaries are not like tuples: they are optimized for key-value lookup. The tuple is also immutable, which distinguishes it from a list (you could think of it as sort of like a frozenlist ). If you find yourself converting dictionaries to lists and back, you are almost certainly doing something wrong; an example would help.

Named tuples exist for convenience and are intended to replace simple classes rather than dictionaries, really. Ordered dictionaries are just a bit of wrapping to remember the order in which things were added to the dictionary. And neither is new in 3.x (although there may be better language support for them; I haven't looked).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM