Consider this...
from itertools import groupby
from operator import itemgetter
data = [{'pid': 1, 'items': 1}, {'pid': 2, 'items': 5}, {'pid': 1, 'items': 3}]
data = sorted(data, key=itemgetter('pid'))
for pid, rows in groupby(data, lambda x: x['pid']):
print(pid, sum(r['items'] for r in rows))
for key in ['items']:
print(pid, sum(r[key] for r in rows))
The first print()
call prints the right #, 4 for pid
1, 5 for 2. The second print()
call, in the loop through the key list, prints 0 for both. What's going on?
The rows
object you get from groupby
is a type of generator that can only be consumed once. As you iterate through it for your first print statement, you consume the values, and thus rows
is an empty generator when you try to iterate over it the next time -- you've already visited and used up your access to its iteration abilities.
You could use row_list = list(rows)
then use row_list
if you want the items to be persistent for multiple iteration passes.
For greater clarity, I suggest putting your code into the Python REPL and inspecting type(rows)
in that loop, and looking at what API that object provides.
You're running into a very common issue with generators - that they can only be iterated through once. itertools
returns generators as a rule.
From the docs for groupby
:
The returned group is itself an iterator that shares the underlying iterable with
groupby()
. Because the source is shared, when thegroupby()
object is advanced, the previous group is no longer visible.
Simply remove one of your print()
calls, and watch it work. If you need to access the returned data multiple times, a list is a potential structure to save your results in.
Fixed code:
from itertools import groupby
from operator import itemgetter
data = [{'pid': 1, 'items': 1}, {'pid': 2, 'items': 5}, {'pid': 1, 'items': 3}]
data = sorted(data, key=itemgetter('pid'))
for pid, rows_gen in groupby(data, lambda x: x['pid']):
rows=list(rows_gen) # save the group to access more than once
print(pid, sum(r['items'] for r in rows))
for key in ['items']:
print(pid, sum(r[key] for r in rows))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.