简体   繁体   中英

Why use itertools.groupby instead of doing it yourself?

from collections import defaultdict
import itertools

items = [(0, 0), (0, 1), (1, 0), (1, 1)]

keyfunc = lambda x: x[0]

# Grouping yourself
item_map = defaultdict(list)
for item in items:
    item_map[keyfunc(item)].append(item)

# Using itertools.groupby
item_map = {}
for key, group in itertools.groupby(items, keyfunc):
    item_map[key] = [i for i in group]

What is so great about itertools.groupby that I should use it instead of doing it myself? Can it perform the grouping in less time complexity? Or, am I missing the point with my use case, and groupby should be used for other cases?


Another poster mentioned that itertools.groupby will return a different result if the items to be grouped are not sorted by the key (or rather just that keys are consecutive to one another).

For example, with items = [(0, 0), (1, 1), (0, 2)] , if we don't sort on the key, itertools.groupby returns

{0: [(0, 2)], 1: [(1, 1)]}

Whereas my implementation returns

{0: [(0, 0), (0, 2)], 1: [(1, 1)]}

Unless I'm misunderstanding the point, it would seem that the DIY method is better because it doesn't require the data to be sorted.

Here is the documentation :

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function

Generally the point of using iterators is to avoid keeping an entire data set in memory. In your example, it doesn't matter because:

  • The input is already all in memory.
  • You're just dumping everything into a dict , so the output is also all in memory.

Or, am I missing the point with my use case, and groupby should be used for other cases?

I think that's an accurate assessment.

Suppose items is an iterator (eg let's say it's lines being read from stdin) and the output is something other than an in-memory data structure (eg stdout):

for key, group in itertools.groupby(items, keyfunc):
    print("{}: {}".format(key, str([i for i in group])))

Now it would be less trivial to do that yourself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM