简体   繁体   中英

Order of operations in a dictionary comprehension

I came across the following interesting construct:

assuming you have a list of lists as follows:

my_list = [['captain1', 'foo1', 'bar1', 'foobar1'], ['captain2', 'foo2', 'bar2', 'foobar2'], ...]

and you want to create a dict out of them with the 0 -index elements being the keys. A handy way to do it would be this:

my_dict = {x.pop(0): x for x in my_list}
# {'captain1': ['foo1', 'bar1', 'foobar1'], ...}

As it seems, the pop precedes the assignment of list x as the value and that is why 'captain' does not appear in the values (it is already popped)

Now let's take this a step further and try to get a structure like:

# {'captain1': {'column1': 'foo1', 'column2': 'bar1', 'column3': 'foobar1'}, ...}

For this task I wrote the following:

my_headers = ['column1', 'column2', 'column3']
my_dict = {x.pop(0): {k: v for k, v in zip(my_headers, x)} for x in my_list}

but this returns:

# {'captain1': {'col3': 'bar1', 'col1': 'captain1', 'col2': 'foo1'}, 'captain2': {'col3': 'bar2', 'col1': 'captain2', 'col2': 'foo2'}}

so the pop in this case happens after the inner dictionary is constructed (or at least after the zip ).

How can that be? How does this work?

The question is not about how to do it but rather why this behavior is seen.

I am using Python version 3.5.1.

Note : As of Python 3.8 and PEP 572 , this was changed and the keys are evaluated first.


tl;dr Until Python 3.7 : Even though Python does evaluate values first (the right-side of the expression) this does appear to be a bug in (C)Python according to the reference manual and the grammar and the PEP on dict comprehensions .

Though this was previously fixed for dictionary displays where values were again evaluated before the keys, the patch wasn't amended to include dict-comprehensions. This requirement was also mentioned by one of the core-devs in a mailing list thread discussing this same subject .

According to the reference manual, Python evaluates expressions from left to right and assignments from right to left ; a dict-comprehension is really an expression containing expressions, not an assignment * :

{expr1: expr2 for ...}

where, according to the corresponding rule of the grammar one would expect expr1: expr2 to be evaluated similarly to what it does in displays. So, both expressions should follow the defined order, expr1 should be evaluated before expr2 (and, if expr2 contains expressions of its own, they too should be evaluated from left to right.)

The PEP on dict-comps additionally states that the following should be semantically equivalent:

The semantics of dict comprehensions can actually be demonstrated in stock Python 2.2, by passing a list comprehension to the built-in dictionary constructor:

>>> dict([(i, chr(65+i)) for i in range(4)])

is semantically equivalent to:

>>> {i : chr(65+i) for i in range(4)}

were the tuple (i, chr(65+i)) is evaluated left to right as expected.

Changing this to behave according to the rules for expressions would create an inconsistency in the creation of dict s, of course. Dictionary comprehensions and a for loop with assignments result in a different evaluation order but, that's fine since it is just following the rules.

Though this isn't a major issue it should be fixed (either the rule of evaluation, or the docs) to disambiguate the situation.

* Internally , this does result in an assignment to a dictionary object but, this shouldn't break the behavior expressions should have. Users have expectations about how expressions should behave as stated in the reference manual.


As the other answerers pointed out, since you perform a mutating action in one of the expressions, you toss out any information on what gets evaluated first; using print calls, as Duncan did, sheds light on what is done.

A function to help in showing the discrepancy:

def printer(val):
    print(val, end=' ')
    return val

(Fixed) dictionary display:

>>> d = {printer(0): printer(1), printer(2): printer(3)}
0 1 2 3

(Odd) dictionary comprehension:

>>> t = (0, 1), (2, 3)
>>> d = {printer(i):printer(j) for i,j in t}
1 0 3 2

and yes, this applies specifically for C Python. I am not aware of how other implementations evaluate this specific case (though they should all conform to the Python Reference Manual.)

Digging through the source is always nice (and you also find hidden comments describing the behavior too), so let's peek in compiler_sync_comprehension_generator of the file compile.c :

case COMP_DICTCOMP:
    /* With 'd[k] = v', v is evaluated before k, so we do
       the same. */
    VISIT(c, expr, val);
    VISIT(c, expr, elt);
    ADDOP_I(c, MAP_ADD, gen_index + 1);
    break;

this might seem like a good enough reason and, if it is judged as such, should be classified as a documentation bug, instead.

On a quick test I did, switching these statements around ( VISIT(c, expr, elt); getting visited first) while also switching the corresponding order in MAP_ADD (which is used for dict-comps):

TARGET(MAP_ADD) {
    PyObject *value = TOP();   # was key 
    PyObject *key = SECOND();  # was value
    PyObject *map;
    int err;

results in the evaluation one would expect based on the docs, with the key evaluated before the value. (Not for their asynchronous versions, that's another switch required.)


I'll drop a comment on the issue and update when and if someone gets back to me.

Created Issue 29652 -- Fix evaluation order of keys/values in dict comprehensions on the tracker. Will update the question when progress is made on it.

As it seems, the pop precedes the assignment of list x as the value and that is why 'captain' does not appear in the values (it is already popped)

No, the order in which it happens is irrelevant. You are mutating the list so you will see the modified list after the pop wherever you use it. Note that in general you probably don't want to do this as you will destroy the original list. Even if that doesn't matter this time its a trap for the unwary in the future.

In both cases the value side is calculated first and then the corresponding key. It's just that in your first case it doesn't matter whereas it does in the second.

You can see this quite easily:

>>> def foo(a): print("foo", a)
... 
>>> def bar(a): print("bar", a)
... 
>>> { foo(a):bar(a) for a in (1, 2, 3) }
('bar', 1)
('foo', 1)
('bar', 2)
('foo', 2)
('bar', 3)
('foo', 3)
{None: None}
>>> 

Note that you should not write code that depends on the values being evaluated first: the behaviour may change in future versions (it was said in some places to have changed in Python 3.5 and later although in fact that appears not to be the case).

A simpler way to do this, which avoids mutating the original data structure:

my_dict = {x[0]: x[1:] for x in my_list}

Or your second example:

my_headers = ['column1', 'column2', 'column3']
my_dict = {x[0]: {k: v for k, v in zip(my_headers, x[1:])} for x in my_list}

To answer the comments: the zip uses the original x because it is evaluated before the pop , but it uses the content of the list to construct a new list so any later changes to the list aren't seen in the result. The first comprehension also uses the original x as the value, but it then mutates the list so the value still sees the original list and hence the mutation.

As I said in comment that's because in a dictionary comprehension python evaluates the value first. And as a more pythonic approach you can use unpacking variables for this task, instead of popping from list in each iteration:

In [32]: my_list = [['captain1', 'foo1', 'bar1', 'foobar1'], ['captain2', 'foo2', 'bar2', 'foobar2']]

In [33]: {frist: {"column{}".format(i): k for i, k in enumerate(last, 1)} for frist, *last in my_list}
Out[33]: 
{'captain2': {'column3': 'foobar2', 'column1': 'foo2', 'column2': 'bar2'},
 'captain1': {'column3': 'foobar1', 'column1': 'foo1', 'column2': 'bar1'}}

Regarding the strange behavior of python in evaluating the keys and values in a dictionary comprehension, after some experiments I realized that this behavior is somehow reasonable rather than being a bug.

I'll brake down my impression in following parts:

  1. In an assignment expression, python evaluates the right side first. from doc:

    Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side.

  2. Dictionary comprehension is an expression and will be evaluated left to right but since there is an assignment under the hood, after translating it by python. the value which is the right had side will be evaluated first.

    for example the following comprehension:

    {b.pop(0): b.pop(0) for _ in range(1)} is equivalent with following snippet:


def dict_comprehension():
    the_dict = {}
    for _ in range(1):
        the_dict[b.pop(0)] = b.pop(0)
    return the_dict

Here are some examples:

In [12]: b = [4, 0]

# simple rule : Python evaluates expressions from left to right.
In [13]: [[b.pop(0), b.pop(0)] for _ in range(1)]
Out[13]: [[4, 0]]

In [14]: b = [4, 0]
# while evaluating an assignment (aforementioned rule 1), the right-hand side is evaluated before the left-hand side.
In [15]: {b.pop(0): b.pop(0) for _ in range(1)}
Out[15]: {0: 4}

In [16]: b = [4, 0]
# This is not a dictionary comprehension and will be evaluated left to right.
In [17]: {b.pop(0): {b.pop(0) for _ in range(1)}}
Out[17]: {4: {0}}

In [18]: b = [4, 0]
# This is not a dictionary comprehension and will be evaluated left to right.
In [19]: {b.pop(0): b.pop(0) == 0}
Out[19]: {4: True}

In [20]: b = [4, 0]
# dictionary comprehension.
In [21]: {b.pop(0): {b.pop(0) for _ in range(1)} for _ in range(1)}
Out[21]: {0: {4}}

Regarding the disparity between the the fact (Or it's better to say abstraction) that dictionary comprehensions are expression and should be evaluated left to right (based on python documentation) with the observed behaviors I think it's actually a problem and immaturity of the python documentation and not a bug in python code. Because it's not reasonable at all to change the functionality because of the having a consistent documentation without any exception.

Actually your observation doesn't require special ordering of the operation. The reason is that x.pop(0) modifies the object x . So whether you evaluate the value ( x ) before or after the key ( x.pop(0) ) doesn't matter in this case.

Anyway I don't think the python language specification prescribes a certain order of operations, which means that you should not rely on the order being any particular.

Actually the standard implementation happens to evaluate the value before it evaluates the key, but there's nowhere in the standard where this is stated. The only guarantee is that the key-value pairs are evaluating in iteration order and they are inserted in that order.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM