Group 'continuation' items in list. Is storing state in itertools groupby key function bad?

Question

I'm new to Python and I'm trying to write a function that groups list items with None signaling continuation items like so:

>>> g([1, None, 1, 1, None, None, 1])
[[1, None], [1], [1, None, None], [1]]

My real data has much more complex items but I've simplified things to the core for this question.

This is my solution so far:

import itertools

# input
x = [1, None, 1, 1, None, None, 1]

# desired output from g(x)
y = [[1, None], [1], [1, None, None], [1]]


def f(x):
    if x is None:
        f.lastx = x
    else:
        if x != f.lastx:
            f.counter += 1
    return f.counter


def g(x):
    f.lastx = None
    f.counter = 0
    z = [list(g) for _, g in itertools.groupby(x, f)]
    return z


assert y == g(x)

This works but I know it's very ugly.

Is there a better (and more Pythonic) way to do this? Eg without a stateful key function.

Answer 1

You could combine itertools.groupby and itertools.accumulate :

>>> dat = [1, None, 1, 1, None, None, 1]
>>> it = iter(dat)
>>> acc = accumulate(x is not None for x in dat)
>>> [[next(it) for _ in g] for _, g in groupby(acc)]
[[1, None], [1], [1, None, None], [1]]

This works because the accumulate will give us increasing intlike values at the start of every new group:

>>> list(accumulate(x is not None for x in dat))
[True, 1, 2, 3, 3, 3, 4]

If you want to be able to handle a stream, just tee the iterator. The maximum increase in memory use is only of order the size of one group.

def cgroup(source):
    it, it2 = tee(iter(source), 2)
    acc = accumulate(x is not None for x in it)
    for _,g in groupby(acc):
        yield [next(it2) for _ in g]

This still gives

>>> list(cgroup([1, None, 1, 1, None, None, 1]))
[[1, None], [1], [1, None, None], [1]]

but will work even with infinite sources:

>>> stream = chain.from_iterable(repeat([1, 1, None]))
>>> list(islice(cgroup(stream), 10))
[[1], [1, None], [1], [1, None], [1], [1, None], [1], [1, None], [1], [1, None]]

Answer 2

It's not perfect because it needs a third-party extension ( iteration_utilities.split ) and some tinkering but it does produce the desired output:

>>> from iteration_utilities import split, is_not_None

>>> lst = [1, None, 1, 1, None, None, 1]

>>> list(split(lst, is_not_None, keep_after=True))[1:]
[[1, None], [1], [1, None, None], [1]]

The first element needs to be discarded (thus the [1:] ) with this approach because otherwise the result would start with an empty sublist.

Group 'continuation' items in list. Is storing state in itertools groupby key function bad?

Question

2 answers

solution1
2 ACCPTED 2017-04-06 01:22:44

solution2
1 2017-04-06 01:25:34

Group 'continuation' items in list. Is storing state in itertools groupby key function bad?

Question

2 answers

solution1 2 ACCPTED 2017-04-06 01:22:44

solution2 1 2017-04-06 01:25:34

solution1
2 ACCPTED 2017-04-06 01:22:44

solution2
1 2017-04-06 01:25:34