简体   繁体   English

为什么Python将元组,列表,集合和字典视为根本不同的东西?

[英]Why does Python treat tuples, lists, sets and dictionaries as fundamentally different things?

One of the reasons I love Python is the expressive power / reduced programming effort provided by tuples, lists, sets and dictionaries. 我喜欢Python的原因之一是由元组,列表,集合和字典提供的表达能力/减少编程工作。 Once you understand list comprehensions and a few of the basic patterns using in and for , life gets so much better! 一旦你理解了列表理解和一些使用infor的基本模式,生活就会变得更好! Python rocks. Python摇滚。

However I do wonder why these constructs are treated as differently as they are, and how this is changing (getting stranger) over time. 但是我确实想知道为什么这些结构会被视为不同,以及随着时间的推移它会如何变化(变得陌生)。 Back in Python 2.x, I could've made an argument they were all just variations of a basic collection type, and that it was kind of irritating that some non-exotic use cases require you to convert a dictionary to a list and back again. 回到Python 2.x,我可以提出一个论点,他们都只是基本集合类型的变体,并且有些令人恼火的是,一些非奇特的用例要求你将字典转换为列表并返回再次。 (Isn't a dictionary just a list of tuples with a particular uniqueness constraint? Isn't a list just a set with a different kind of uniqueness constraint?). (字典不仅仅是具有特定唯一性约束的元组列表吗?列表不是仅具有不同类型的唯一性约束的集合吗?)。

Now in the 3.x world, it's gotten more complicated. 现在在3.x世界中,它变得更加复杂。 There are now named tuples -- starting to feel more like a special-case dictionary. 现在有名为元组 - 开始感觉更像是一个特例字典。 There are now ordered dictionaries -- starting to feel more like a list. 现在有订单的词典 - 开始感觉更像是一个列表。 And I just saw a recipe for ordered sets. 我刚看到有序套装的配方。 I can picture this going on and on ... what about unique lists, etc. 我可以想象一下这个......还有什么独特的清单等等。

The Zen of Python says "There should be one-- and preferably only one --obvious way to do it". Python的禅宗说“应该有一个 - 最好只有一个 - 明显的方式”。 It seems to me this profusion of specialized collections types is in conflict with this Python precept. 在我看来,这种专门的集合类型的大量与这个Python规则相冲突。

What do the hardcore Pythonistas think? 硬核Pythonistas的想法是什么?

These data types all serve different purposes, and in an ideal world you might be able to unify them more. 这些数据类型都有不同的用途,在理想的世界中,您可以更多地统一它们。 However, in the real world we need to have efficient implementations of the basic collections, and eg ordering adds a runtime penalty. 但是,在现实世界中,我们需要有效地实现基本集合,例如,排序会增加运行时惩罚。

The named tuples mainly serve to make the interface of stat() and the like more usable, and also can be nice when dealing with SQL row sets. 命名元组主要用于使stat()等接口更有用,并且在处理SQL行集时也可以很好用。

The big unification you're looking for is actually there, in the form of the different access protocols (getitem, getattr, iter, ...), which these types mix and match for their intended purposes. 你正在寻找的大统一实际上是以不同的访问协议(getitem,getattr,iter,...)的形式,这些类型混合和匹配用于它们的预期目的。

tl;dr (duck-typing) tl; dr(鸭子打字)

You're correct to see some similarities in all these data structures. 你在所有这些数据结构中看到一些相似之处是正确的。 Remember that python uses duck-typing (if it looks like a duck and quacks like a duck then it is a duck). 请记住,python使用duck-typing (如果它看起来像鸭子和呱呱叫鸭子那么它就是鸭子)。 If you can use two objects in the same situation then, for your current intents and purposes, they might as well be the same data type. 如果您可以在相同的情况下使用两个对象,那么对于您当前的意图和目的,它们可能也是相同的数据类型。 But you always have to keep in mind that if you try to use them in other situations, they may no longer behave the same way. 但是你总是要记住,如果你试图在其他情况下使用它们,它们可能不再以相同的方式运行。

With this in mind we should take a look at what's actually different and the same about the four data types you mentioned, to get a general idea of the situations where they are interchangeable. 考虑到这一点,我们应该看看你提到的四种数据类型的实际不同和相同之处,以便大致了解它们可以互换的情况。

Mutability (can you change it?) 可变性(你可以改变吗?)

You can make changes to dictionaries, lists, and sets. 您可以更改词典,列表和集。 Tuples cannot be "changed" without making a copy. 如果不制作副本,则无法“更改”元组。

  • Mutable: dict , list , set 可变: dictlistset

    Immutable: tuple 永恒: tuple

Python string is also an immutable type. Python string也是一种不可变类型。 Why do we want some immutable objects? 为什么我们想要一些不可变对象? I would paraphrase from this answer: 我会从这个答案中解释一下:

  1. Immutable objects can be optimized a lot 不可变对象可以进行很多优化

  2. In Python, only immutables are hashable (and only hashable objects can be members of sets, or keys in dictionaries). 在Python中,只有不可变的是可散列的(并且只有可散列的对象可以是集合的成员,或者是字典中的键)。

Comparing across this property, lists and tuples seem like the "closest" two data types. 比较此属性,列表和元组看起来像“最接近”的两种数据类型。 At a high-level a tuple is an immutable "freeze-frame" version of a list. 在高级别,元组是列表的不可变“冻结帧”版本。 This makes lists useful for data sets that will be changing over time (since you don't have to copy a list to modify it) but tuples useful for things like dictionary keys (which must be immutable types). 这使得列表对于随时间变化的数据集很有用(因为您不必复制列表来修改它),但是元组对于字典键(必须是不可变类型)这样的东西很有用。

Ordering (and a note on abstract data types) 订购(以及关于抽象数据类型的说明)

A dictionary, like a set, has no inherent conceptual order to it. 字典就像一个集合,没有固有的概念顺序。 This is in contrast to lists and tuples, which do have an order. 这与有订单的列表和元组形成对比。 The order for the items in a dict or a set is abstracted away from the programmer, meaning that if element A comes before B in a for k in mydata loop, you shouldn't (and can't generally) rely on A being before B once you start making changes to mydata . dict或set中的项目的顺序是从程序员中抽象出来的,这意味着如果元素A for k in mydata循环中的for k in mydata出现在B之前,则不应该(并且通常不能)依赖于A之前的A B一旦你开始改变mydata

  • Order-preserving: list , tuple 保留订单: listtuple

    Non-order-preserving: dict , set 非订单保留: dictset

Technically if you iterate over mydata twice in a row it'll be in the same order, but this is more a convenient feature of the mechanics of python, and not really a part of the set abstract data type (the mathematical definition of the data type). 从技术上讲,如果你连续两次迭代mydata它将是相同的顺序,但这是python机制的一个方便的特性,而不是set 抽象数据类型的一部分( 数据的数学定义)类型)。 Lists and tuples do guarantee order though, especially tuples which are immutable. 列表和元组确实保证了顺序,特别是不可变的元组。

What you see when you iterate (if it walks like a duck...) 你在迭代时看到的东西(如果它像鸭子一样走路......)

  • One "item" per "element": set , list , tuple 每个“元素”一个“项目”: setlisttuple

    Two "items" per "element": dict 每个“元素”有两个“项目”: dict

I suppose here you could see a named tuple, which has both a name and a value for each element, as an immutable analogue of a dictionary. 我想在这里你可以看到一个命名元组,它具有每个元素的名称和值,作为字典的不可变模拟。 But this is a tenuous comparison- keep in mind that duck-typing will cause problems if you're trying to use a dictionary-only method on a named tuple, or vice-versa. 但这是一个微妙的比较 - 请记住,如果您尝试在命名元组上使用仅字典方法,则鸭子类型将导致问题,反之亦然。

Direct responses to your questions 直接回答您的问题

Isn't a dictionary just a list of tuples with a particular uniqueness constraint? 字典不仅仅是具有特定唯一性约束的元组列表吗?

No, there are several differences. 不,有几个不同之处。 Dictionaries have no inherent order, which is different from a list, which does. 字典没有固有的顺序,这与列表不同。

Also, a dictionary has a key and a value for each "element". 此外,字典具有每个“元素”的键和值。 A tuple, on the other hand, can have an arbitrary number of elements, but each with only a value. 另一方面,元组可以具有任意数量的元素,但每个元素仅具有值。

Because of the mechanics of a dictionary, where keys act like a set, you can look up values in constant time if you have the key. 由于字典的机制,键的作用就像一个集合,如果你有密钥,你可以在恒定的时间内查找值。 In a list of tuples (pairs here), you would need to iterate through the list until you found the key, meaning search would be linear in the number of elements in your list. 在元组列表(这里是对)中,您需要遍历列表直到找到密钥,这意味着搜索将与列表中的元素数量成线性关系。

Most importantly, though, dictionary items can be changed, while tuples cannot. 但最重要的是,字典项可以更改,而元组则不能。

Isn't a list just a set with a different kind of uniqueness constraint? 列表不是仅具有不同类型的唯一性约束的集合吗?

Again, I'd stress that sets have no inherent ordering, while lists do. 我再次强调,集合没有固有的顺序,而列表则没有。 This makes lists much more useful for representing things like stacks and queues, where you want to be able to remember the order in which you appended items. 这使列表更有用于表示堆栈和队列之类的内容,您希望能够记住附加项目的顺序。 Sets offer no such guarantee. 套装没有这样的保证。 However they do offer the advantage of being able to do membership lookups in constant time, while again lists take linear time. 然而,它们确实提供了能够在恒定时间内进行成员资格查找的优势,而再次列表需要线性时间。

There are now named tuples -- starting to feel more like a special-case dictionary. 现在有名为元组 - 开始感觉更像是一个特例字典。 There are now ordered dictionaries -- starting to feel more like a list. 现在有订单的词典 - 开始感觉更像是一个列表。 And I just saw a recipe for ordered sets. 我刚看到有序套装的配方。 I can picture this going on and on ... what about unique lists, etc. 我可以想象一下这个......还有什么独特的清单等等。

To some degree I agree with you. 在某种程度上,我同意你的意见。 However data structure libraries can be useful to support common use-cases for already well-established data structures. 但是,数据结构库可用于支持已经完善的数据结构的常见用例。 This keep the programmer from wasting time trying to come up with custom extensions to the standard structures. 这使程序员不必浪费时间尝试为标准结构提供自定义扩展。 As long as it doesn't get out of hand, and we can still see the unique usefulness in each solution, it's good to have a wheel on the shelf so we don't need to reinvent it. 只要它不会失控,我们仍然可以看到每个解决方案中的独特用途,最好在货架上安装一个轮子,这样我们就不需要重新发明它了。

A great example is the Counter() class. 一个很好的例子是Counter()类。 This specialized dictionary has been of use to me more times than I can count (badoom-tshhhhh!) and it has saved me the effort of coding up a custom solution. 这个专业词典对我来说比我可以使用的次数多了很多次(badoom-tshhhhh!),它为我节省了编写自定义解决方案的工作量。 I'd much rather have a solution that the community is helping me to develop and keep with proper python best-practices than something that sits around in my custom data structures folder and only gets used once or twice a year. 我宁愿有一个解决方案,社区正在帮助我开发和保持适当的python最佳实践,而不是在我的自定义数据结构文件夹中的东西,并且每年只使用一次或两次。

First of all, Ordered Dictionaries and Named Tuples were introduced in Python 2, but that's beside the point. 首先,在Python 2中引入了有序字典和命名元组,但这不是重点。

I won't point you at the docs since if you were really interested you would have read them already. 我不会指出你的文档,因为如果你真的感兴趣,你会读它们。

The first difference between collection types is mutability. 集合类型之间的第一个区别是可变性。 tuple and frozenset are immutable types. tuplefrozenset是不可变类型。 This means they can be more efficient than list or set . 这意味着它们比listset更有效。

If you want something you can access randomly or in order, but will mainly change at the end, you want a list . 如果你想要随机或按顺序访问的东西,但最终会改变,你需要一个list If you want something you can also change at the beginning, you want a deque . 如果你想要的东西也可以在开始时改变,你需要一个deque

You simply can't have your cake and eat it too -- every feature you add causes you to lose some speed. 你根本无法拥有自己的蛋糕而且吃得太多 - 你添加的每个功能都会让你失去一些速度。

dict and set are fundamentally different from lists and tuples`. dictsetlists和元set有根本的不同。 They store the hash of their keys, allowing you to see if an item is in them very quickly, but requires the key be hashable. 它们存储密钥的哈希值,允许您快速查看项目是否在其中,但需要密钥可以清除。 You don't get the same membership testing speed with linked lists or arrays. 您没有使用链接列表或数组获得相同的成员资格测试速度。

When you get to OrderedDict and NamedTuple , you're talking about subclasses of the builtin types implemented in Python, rather than in C. They are for special cases, just like any other code in the standard library you have to import . 当你到达OrderedDictNamedTuple ,你谈论的是用Python实现的内置类型的子类,而不是C语言。它们用于特殊情况,就像你必须导入的标准库中的任何其他代码一样。 They don't clutter up the namespace but are nice to have when you need them. 它们不会使命名空间变得混乱,但是当你需要它们时很高兴。

One of these days, you'll be coding, and you'll say, "Man, now I know exactly what they meant by 'There should be one-- and preferably only one --obvious way to do it', a set is just what I needed for this, I'm so glad it's part of the Python language! If I had to use a list, it would take forever ." 有一天,你会编码,你会说,“男人,现在我确切地知道他们的意思是'应该有一个 - 最好只有一个 - 明显的方式去做', set 正是我需要什么,这一点,我很高兴它是Python语言的一部分!如果让我用一个列表,它会永远需要“。 That's when you'll understand why these different types exist. 那时你会明白为什么存在这些不同的类型。

A dictionary is indexed by key (in fact, it's a hash map); 字典按键索引(事实上,它是一个哈希映射); a generic list of tuples won't be. 一个通用的元组列表将不会。 You might argue that both should be implemented as relations, with the ability to add indices at will, but in practice having optimized types for the common use cases is both more convenient and more efficient. 您可能会认为两者都应该作为关系实现,并且能够随意添加索引,但实际上,为常见用例提供优化类型更方便,更有效。

New specialized collections get added because they are common enough that lots of people would end up implementing them using more basic data types, and then you'd have the usual problems with wheel reinvention (wasted effort, lack of interoperability...). 新的专业集合被添加,因为它们很常见,很多人最终会使用更基本的数据类型来实现它们,然后你就会遇到轮子改造的常见问题(浪费精力,缺乏互操作性......)。 And if Python just offered an entirely generic construct, then we'd get lots of people asking "how do I implement a set using a relation", etc. 如果Python只提供了一个完全通用的构造,那么我们会让很多人问“如何使用关系实现集合”等。

(btw, I'm using relation in the mathematical or DB sense) (顺便说一句,我在数学或数据库意义上使用关系)

All of these specialized collection types provide specific functionalities that are not adequately or efficiently provided by the "standard" data types of list, tuple, dict, and set. 所有这些专门的集合类型都提供了由list,tuple,dict和set的“标准”数据类型无法充分或有效提供的特定功能。

For example, sometimes you need a collection of unique items, and you also need to retain the order in which you encountered them. 例如,有时您需要一组唯一项,并且还需要保留遇到它们的顺序。 You can do this using a set to keep track of membership and a list to keep track of order, but your solution will probably be slower and more memory-hungry than a specialized data structure designed for exactly this purpose, such as an ordered set. 您可以使用集合来跟踪成员资格和列表以跟踪顺序,但是您的解决方案可能比专门为此目的而设计的专用数据结构(例如有序集)更慢且更需要内存。

These additional data types, which you see as combinations or variations on the basic ones, actually fill gaps in functionality left by the basic data types. 这些其他数据类型(您将其视为基本数据类型的组合或变体)实际上填补了基本数据类型留下的功能空白。 From a practical perspective, if Python's core or standard library did not provide these data types, then anyone who needed them would invent their own inefficient versions. 从实际角度来看,如果Python的核心或标准库没有提供这些数据类型,那么任何需要它们的人都会发明自己的低效版本。 They are used less often than the basic types, but often enough to make it worth while to provide standard implementations. 它们的使用频率低于基本类型,但通常足以使其提供标准实现。

One of the things I like in Python the most is agility. 我最喜欢Python中的一件事就是敏捷性。 And a lot of functional, effective and usable collections types gives it to me. 许多功能,有效和可用的集合类型给了我。

And there is still one way to do this - each type does its own job. 还有一种方法可以做到这一点 - 每种类型都有自己的工作。

The world of data structures (language agnostic) can generally be boiled down to a few small basic structures - lists, trees, hash-tables and graphs, etc. and variants and combinations thereof. 数据结构的世界(语言不可知)通常可以归结为一些小的基本结构 - 列表,树,散列表和图形等,以及它们的变体和组合。 Each has its own specific purpose in terms of use and implementation. 每个在使用和实施方面都有自己的特定目的。

I don't think that you can do things like reduce a dictionary to a list of tuples with a particular uniqueness constraint without actually specifying a dictionary. 我不认为您可以执行诸如将字典缩减为具有特定唯一性约束的元组列表而不实际指定字典的操作。 A dictionary has a specific purpose - key/value look-ups - and the implementation of the data structure is generally tailored to those needs. 字典具有特定目的 - 键/值查找 - 并且数据结构的实现通常是针对这些需求而定制的。 Sets are like dictionaries in many ways, but certain operations on sets don't make sense on a dictionary (union, disjunction, etc). 集合在很多方面都像字典,但集合上的某些操作在字典(union,disjunction等)上没有意义。

I don't see this violating the 'Zen of Python' of doing things one way. 我没有看到这违反了“禅宗之谜”的做法。 While you can use a sorted dictionary to do what a dictionary does without using the sorted part, you're more violating Occam's razor and likely causing a performance penalty. 虽然您可以使用排序字典来执行字典所执行的操作而不使用已排序的部分,但您更多地违反了Occam的剃刀并且可能会导致性能下降。 I see this as different than being able to syntactically do thing different ways a la Perl. 我认为这不同于能够在语法上以不同的方式做一些Perl。

The Zen of Python says "There should be one-- and preferably only one --obvious way to do it". Python的禅宗说“应该有一个 - 最好只有一个 - 明显的方式”。 It seems to me this profusion of specialized collections types is in conflict with this Python precept. 在我看来,这种专门的集合类型的大量与这个Python规则相冲突。

Not remotely. 不是远程的。 There are several different things being done here. 这里有几件不同的事情要做。 We choose the right tool for the job. 我们为工作选择合适的工具。 All of these containers are modeled on decades-old tried, tested and true CS concepts. 所有这些容器都是根据几十年前经过考验,测试和真实的CS概念建模的。

Dictionaries are not like tuples: they are optimized for key-value lookup. 字典与元组不同:它们针对键值查找进行了优化。 The tuple is also immutable, which distinguishes it from a list (you could think of it as sort of like a frozenlist ). 元组也是不可变的,它将它与列表区分开来(您可以将其视为类似于frozenlist )。 If you find yourself converting dictionaries to lists and back, you are almost certainly doing something wrong; 如果您发现自己将字典转换为列表并返回,那么您几乎肯定会做错事; an example would help. 一个例子会有所帮助。

Named tuples exist for convenience and are intended to replace simple classes rather than dictionaries, really. 为方便起见,存在命名元组,实际上是用于替换简单类而不是字典。 Ordered dictionaries are just a bit of wrapping to remember the order in which things were added to the dictionary. 有序词典只是为了记住事物被添加到字典中的顺序。 And neither is new in 3.x (although there may be better language support for them; I haven't looked). 并且在3.x中都不是新的(虽然可能有更好的语言支持他们;我没看过)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM