简体   繁体   中英

An issue with itertools.groupby in Python

Why the following code return two False pairs?

from itertools import groupby

content = '1\t2\t3\n4\t5\t\n7\t8\t9'

result = groupby((line.split('\t') for line in content.splitlines()),
                 key=lambda x: x[2] == '')

for k, v in result:
    print '--->', k, id(k)
    print list(v)

The result as following shown

---> False 505954168
[['1', '2', '3']]
---> True 505954192
[['4', '5', '']]
---> False 505954168
[['7', '8', '9']]

itertools.groupby

makes an iterator that returns consecutive keys and groups from the iterable. ... It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function).

Emphasis added.

You'll need to sort your input

genexp = (line.split('\t') for line in content.splitlines())
key = lambda x: x[2] == ''

result = groupby(sorted(genexp, key=key), key=key) # Note: same key function

or write your own grouping function. Frankly, it's not that hard:

from collections import defaultdict
dd = defaultdict(list)
for x in genexp:
    dd[key(x)].append(x)
result = dd.items()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM