简体   繁体   English

为什么创建列表列表会产生意外行为?

[英]Why does creating a list of lists produce unexpected behavior?

EDIT: This question is about why the behavior is what it is, not how to get around it, which is what the alleged duplicate is about. 编辑:这个问题是关于为什么行为是事实,而不是如何解决它,即所谓的重复是关于什么的。


I've used the following notation to create lists of a certain size in different cases. 我使用以下表示法在不同情况下创建一定大小的列表。 For example: 例如:

>>> [None] * 5
[None, None, None, None, None]
>>>

This appears to work as expected and is shorter than: 这似乎可以正常工作,并且比以下时间短:

>>> [None for _ in range(5)]
[None, None, None, None, None]
>>>

I then tried to create an list of lists using the same approach: 然后,我尝试使用相同的方法创建列表列表:

>>> [[]] * 5
[[], [], [], [], []]
>>>

Fair enough. 很公平。 It seems to work as expected. 它似乎按预期工作。

However, while going through the debugger, I noticed that all the sub-list buckets had the same value, even though I had added only a single item. 但是,在调试器中,我注意到所有子列表存储桶都具有相同的值,即使我仅添加了一个项目。 For example: 例如:

>>> t = [[]] * 5
>>> t
[[], [], [], [], []]
>>> t[1].append(4)
>>> t
[[4], [4], [4], [4], [4]]
>>> t[0] is t[1]
True
>>>

I was not expecting all top-level array elements to be references to a single sub-list; 我不希望所有顶级数组元素都引用单个子列表。 I expected 5 independent sub-lists. 我期望有5个独立的子列表。

For that, I had to write code like so: 为此,我必须编写如下代码:

>>> t = [[] for _ in range(5)]
>>> t
[[], [], [], [], []]
>>> t[2].append(4)
>>> t
[[], [], [4], [], []]
>>> t[0] is t[1]
False
>>>

I'm clearly missing something, probably a historical fact or simply a different way in which the consistency here is viewed. 我显然遗漏了一些东西,可能是历史事实,或者只是以不同的方式查看此处的一致性。

Can someone explain why two different code snippets that one would reasonably expect to be equivalent to each other actually end up implicitly producing different and non-obvious (IMO) results, especially given Python's zen of always being explicit and obvious ? 有人可以解释为什么一个合理地期望彼此相等的两个不同的代码片段实际上最终隐式地产生了不同且非显而易见的(IMO)结果,特别是考虑到Python始终保持显式明显的禅宗?

Please note that I'm already aware of this question , which is different to what I'm asking. 请注意,我已经知道了这个问题 ,这与我要问的问题有所不同。

I'm simply looking for a detailed explanation/justification. 我只是在寻找详细的说明/理由。 If there're historical, technical, and/or theoretical reasons for this behavior, then please be sure to include a reference or two. 如果有此行为的历史,技术和/或理论原因,请确保提供一两个参考。

When you do the following: 当您执行以下操作时:

[[]]*n

You are first creating a list , then using the * operator with an int n . 首先要创建一个列表 ,然后将*运算符与int n This takes whatever objects are in your list, and creates n- many repetitions of it. 这将获取列表中的所有对象,并创建n次重复。

But since in Python, explicit is better than implicit, you don't implicitly make a copy of those objects . 但是由于在Python中,显式优于隐式,因此您不必隐式地复制这些对象 Indeed, this is consistent with the semantics of Python. 确实,这与Python的语义一致。

Try to name a single case where Python implicitly makes a copy. 尝试命名一种情况,其中Python 隐式地创建副本。

Furthermore, it is consistent with the addition on the list: 此外,它与列表中的添加内容一致:

l = [1, [], 'a']

l2 = l + l + l

l[1].append('foo')

print(l2)

And the output: 并输出:

[1, ['foo'], 'a', 1, ['foo'], 'a', 1, ['foo'], 'a']

Now, as noted in the comments, coming from C++ it makes sense that the above would be surprising, but if one is used to Python, the above is what one would expect . 现在,正如注释中指出的那样,来自C ++的代码使上面的代码出人意料,这是有道理的,但是如果将它用于Python,那么上面的代码就是我们所期望的

On the other hand: 另一方面:

[[] for _ in range(5)]

Is a list comprehension. 是列表理解。 It is equivalent to: 它等效于:

lst = []
for _ in range(5):
    lst.append([])

Here, clearly, every time you are in the loop you create a new list. 显然,在这里,每次循环时,您都会创建一个新列表。 That is how literal syntax works. 这就是文字语法的工作方式。

As an aside, I almost never use the * operator on lists, except for one particular idiom I am fond of: 顺便说一句,我几乎从来没有在列表上使用*运算符,除了我喜欢的一个特定习惯用法之外:

>>> x = list(range(1, 22))
>>> it_by_three = [iter(x)]*3
>>> for a,b,c in zip(*it_by_three):
...    print(a, b, c)
...
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21

For cpython, the relevant part of the source code is in the function list_repeat in listobject.c . 对于cpython,源代码的相关部分位于list_repeat中的list_repeat函数中。 An enlightening snippet is repeated below, with my added comments: 下面重复了一个启发性的代码段,并添加了我的评论:

np = (PyListObject *) PyList_New(size);  // make a new PyListObject

/* some code omitted */

items = np->ob_item;          // grabs the list of pointers of the *new* object
if (Py_SIZE(a) == 1) {        // this is the case for a 1-element list being multiplied
    elem = a->ob_item[0];     // grabs the pointer of the element of the *original* object
    for (i = 0; i < n; i++) {
        items[i] = elem;      // assigns the original pointer to the new list
        Py_INCREF(elem);
    }
    return (PyObject *) np;
}

Since a PyListObject is mainly a Vector containing a list of pointers to the list elements, it is simple to assign these points as elements to the new PyListObject . 由于PyListObject主要是一个Vector其中包含指向列表元素的指针列表,因此将这些点作为元素分配给新PyListObject很简单。

On the contrary, imagine the code if the object located at each pointer needed to be copied. 相反,想象一下如果需要复制位于每个指针处的对象的代码。 It would be more complex and there would be a noticable performance hit. 它将更加复杂,并且会显着降低性能。 However, I'm not going to speculate in regards to the motivation of this design decision. 但是,我不会就此设计决定的动机进行推测。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM