简体   繁体   中英

Iterate over n successive elements of list (with overlapping)

The itertools python module implements some basic building blocks for iterators. As they say, "they form an iterator algebra". I was expecting, but I could not find a succinctly way of doing the following iteration using the module. Given a list of ordered real numbers, for example

a = [1.0,1.5,2.0,2.5,3.0]

... return a new list (or just iterate) grouping by some n value, say 2

b = [(1.0,1.5),(1.5,2.0),(2.0,2.5),(2.5,3.0)]

The way I found of doing this was as follows. First split the list in two, with evens and odds indexes:

even, odds = a[::2], a[1::2]

Then construct the new list:

b = [(even, odd) for even, odd in zip(evens, odds)]
b = sorted(b + [(odd, even) for even, odd in zip(evens[1:], odds)])

In essence, it is similar to a moving mean.

Is there a succinctly way of doing this (with or without itertools)?


PS.:

Application

Imagine the a list as the set of timestamps of some events occurred during an experiment:

timestamp       event
47.8            1a
60.5            1b
67.4            2a
74.5            2b
78.5            1a
82.2            1b
89.5            2a
95.3            2b
101.7           1a
110.2           1b
121.9           2a
127.1           2b

...

This code is being used to segment those events in accord with different temporal windows. Right now I am interested in the data between 2 successive events; 'n > 2' would be used only for exploratory purposes.

This is precisely what the pairwise itertools recipe is for, for n=2 that is.

from itertools import tee

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

Demo :

>>> b = [1.0,1.5,2.0,2.5,3.0]
>>> list(pairwise(b))
[(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]

If you are looking for variable group sizes, see user2357112's answer (I like the approach), or more generally you can implement a sliding window iterator and take slices of which there are many approaches .


As an aside, a possibly poorly performing but amusing one-line window you could slice ( to control the overlap ) that isn't on the linked question would be this, using the new yield from syntax to combine generators.

from itertools import tee, islice
def roll_window(it, sz):
    yield from zip(*[islice(it, g, None) for g, it in enumerate(tee(it, sz))])

Demo :

>>> b = [1.0,1.5,2.0,2.5,3.0, 3.5, 4.0, 4.5]
>>> list(islice(window(b, 3), None, None, 2))
[(1.0, 1.5, 2.0), (2.0, 2.5, 3.0), (3.0, 3.5, 4.0)]

For 2, you can just do

b = zip(a, a[1:])  # or list(zip(...)) on Python 3 if you really want a list

For fixed n, the technique is similar:

# n = 4
b = zip(a, a[1:], a[2:], a[3:])

For variable n, you could zip a variable number of slices, or (especially if the window size is close to the size of a ) you could use slicing to take windows directly:

b = zip(*[a[i:] for i in xrange(n)])
# or
b = [tuple(a[i:i+n]) for i in xrange(len(a)-n+1)]

If a is not a list, you could generalize the pairwise recipe from the itertools docs:

import copy
import itertools

def nwise(iterable, n):
    # Make n tees at successive positions along the iterable.
    tees = list(itertools.tee(iterable, 1))
    for _ in xrange(n-1):
        tees.append(copy.copy(tees[-1]))
        next(tees[-1])

    return zip(*tees)

Using a generator:

def groupListByN(lst, n):
  for i in range(len(a)-n+1):
    yield lst[i:i+n]

a = [1.0,1.5,2.0,2.5,3.0]
myDoubleList = [group for group in groupListByN(a, 2)]
myTripleList = [group for group in groupListByN(a, 3)]

print(myDoubleList)
print(myTripleList)

Result:

[[1.0, 1.5], [1.5, 2.0], [2.0, 2.5], [2.5, 3.0]]
[[1.0, 1.5, 2.0], [1.5, 2.0, 2.5], [2.0, 2.5, 3.0]]

I think this solution is pretty succinct

Slice makes a copy of a sequence type so if you have a list of 1000 elements and the n is 3, lst , lst[1:] and lst[2:] puts 2997 (1000+999+998) items in memory.

Instead you can do it in more generic way for all iterables like:

def n_wise(iterable, n=2):
    from collections import deque
    it = iter(iterable)

    try:
        d = deque([next(it) for _ in range(n)], maxlen=n)
        while True:
            yield tuple(d)
            d.append(next(it))
    except StopIteration:
        pass

This time only 1000 + n items are in memory at a time.

test:

a = [1.0, 1.5, 2.0, 2.5, 3.0]
print(list(n_wise(a)))
print(list(n_wise(a, 3)))

# [(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]
# [(1.0, 1.5, 2.0), (1.5, 2.0, 2.5), (2.0, 2.5, 3.0)]

As mentioned, the pairwise recipe does overlapping pairs.

This recipe is also implemented in an external library, more_itertools , among other helpful windowing tools :

import more_itertools as mit


a = [1.0, 1.5, 2.0, 2.5, 3.0]

list(mit.pairwise(a))
# [(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]

list(mit.windowed(a, n=2))
# [(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]

list(mit.stagger(a, offsets=(0, 1)))
# [(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]

Notice, with more_itertools.windowed , you can control n , the size of the sliding window (and even the amount of overlap via a step parameter if needed). This tool may be useful in you exploration.

Install this library via > pip install more_itertools .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM