简体   繁体   中英

Is python Counter expected to behave as follows?

I have a list of dictionaries as follows

[
    {'sex': 2, 'newspaper_sheet__country': 'ML', 'n': 7},
    {'sex': 1, 'newspaper_sheet__country': 'ML', 'n': 5},
    {'sex': 2, 'newspaper_sheet__country': 'ML', 'n': 10}
]

I then have 2 Counters

from collections import Counter

counts = Counter()
counts1 = Counter()

I'm updating the two counters in the following formats

for row in rows:
    counts.update({(row['sex'], row['newspaper_sheet__country']): row['n']})

and

counts1.update({(row['sex'], row['newspaper_sheet__country']): row['n'] for row in rows})

I would expect the values of the 2 counts to be the same since the only difference is 1 is using a for loop and the other one a dict comprehension.

Why are the 2 values different?

By calling Counter.update in each iteration of a for loop the Counter object would get updated with the input dict for each call.

With a dict comprehension, key-values get aggregated into a dict first before getting passed to Counter.update . Since latter values of duplicating keys in a dict comprehension would override the preceding values of the same keys, the value 10 of the key (2, 'ML') overrides the value 7 of the same key, resulting in the Counter object not accounting for the value 7 in the end.

Because calling .update in a loop like that is not equivalent to passing the result of that dictionary comprehension, look what that dictionary comprehension creates:

>>> rows = [
...     {'sex': 2, 'newspaper_sheet__country': 'ML', 'n': 7},
...     {'sex': 1, 'newspaper_sheet__country': 'ML', 'n': 5},
...     {'sex': 2, 'newspaper_sheet__country': 'ML', 'n': 10}
... ]
>>> {(row['sex'], row['newspaper_sheet__country']): row['n'] for row in rows}
{(2, 'ML'): 10, (1, 'ML'): 5}

Dictionaries have unique keys, and the last item seen is kept.

The difference is because of the way update() is done with list comprehension.

With for loop based approach, the counter is updated each time and aggregates the count for the matching keys but with the list comprehension approach, it is only getting a dictionary with unique keys.

The list comprehension approach can be broken down as:

dic = {(row['sex'], row['newspaper_sheet__country']): row['n'] for row in rows}
print(dic)  # dic only contains unique key value pairs here
counts1.update(dic)

So, counts1 is updated just once while counts is updated multiple times due to the loop based approach.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM