The itertools python module implements some basic building blocks for iterators. As they say, "they form an iterator algebra". I was expecting, but I could not find a succinctly way of doing the following iteration using the module. Given a list of ordered real numbers, for example
a = [1.0,1.5,2.0,2.5,3.0]
... return a new list (or just iterate) grouping by some n
value, say 2
b = [(1.0,1.5),(1.5,2.0),(2.0,2.5),(2.5,3.0)]
The way I found of doing this was as follows. First split the list in two, with evens and odds indexes:
even, odds = a[::2], a[1::2]
Then construct the new list:
b = [(even, odd) for even, odd in zip(evens, odds)]
b = sorted(b + [(odd, even) for even, odd in zip(evens[1:], odds)])
In essence, it is similar to a moving mean.
Is there a succinctly way of doing this (with or without itertools)?
PS.:
Application
Imagine the a
list as the set of timestamps of some events occurred during an experiment:
timestamp event
47.8 1a
60.5 1b
67.4 2a
74.5 2b
78.5 1a
82.2 1b
89.5 2a
95.3 2b
101.7 1a
110.2 1b
121.9 2a
127.1 2b
...
This code is being used to segment those events in accord with different temporal windows. Right now I am interested in the data between 2
successive events; 'n > 2' would be used only for exploratory purposes.
This is precisely what the pairwise
itertools recipe is for, for n=2
that is.
from itertools import tee
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return zip(a, b)
Demo :
>>> b = [1.0,1.5,2.0,2.5,3.0]
>>> list(pairwise(b))
[(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]
If you are looking for variable group sizes, see user2357112's answer (I like the approach), or more generally you can implement a sliding window iterator and take slices of which there are many approaches .
As an aside, a possibly poorly performing but amusing one-line window you could slice ( to control the overlap ) that isn't on the linked question would be this, using the new yield from
syntax to combine generators.
from itertools import tee, islice
def roll_window(it, sz):
yield from zip(*[islice(it, g, None) for g, it in enumerate(tee(it, sz))])
Demo :
>>> b = [1.0,1.5,2.0,2.5,3.0, 3.5, 4.0, 4.5]
>>> list(islice(window(b, 3), None, None, 2))
[(1.0, 1.5, 2.0), (2.0, 2.5, 3.0), (3.0, 3.5, 4.0)]
For 2, you can just do
b = zip(a, a[1:]) # or list(zip(...)) on Python 3 if you really want a list
For fixed n, the technique is similar:
# n = 4
b = zip(a, a[1:], a[2:], a[3:])
For variable n, you could zip a variable number of slices, or (especially if the window size is close to the size of a
) you could use slicing to take windows directly:
b = zip(*[a[i:] for i in xrange(n)])
# or
b = [tuple(a[i:i+n]) for i in xrange(len(a)-n+1)]
If a
is not a list, you could generalize the pairwise
recipe from the itertools docs:
import copy
import itertools
def nwise(iterable, n):
# Make n tees at successive positions along the iterable.
tees = list(itertools.tee(iterable, 1))
for _ in xrange(n-1):
tees.append(copy.copy(tees[-1]))
next(tees[-1])
return zip(*tees)
Using a generator:
def groupListByN(lst, n):
for i in range(len(a)-n+1):
yield lst[i:i+n]
a = [1.0,1.5,2.0,2.5,3.0]
myDoubleList = [group for group in groupListByN(a, 2)]
myTripleList = [group for group in groupListByN(a, 3)]
print(myDoubleList)
print(myTripleList)
Result:
[[1.0, 1.5], [1.5, 2.0], [2.0, 2.5], [2.5, 3.0]]
[[1.0, 1.5, 2.0], [1.5, 2.0, 2.5], [2.0, 2.5, 3.0]]
I think this solution is pretty succinct
Slice makes a copy of a sequence type so if you have a list of 1000 elements and the n
is 3, lst
, lst[1:]
and lst[2:]
puts 2997 (1000+999+998) items in memory.
Instead you can do it in more generic way for all iterables like:
def n_wise(iterable, n=2):
from collections import deque
it = iter(iterable)
try:
d = deque([next(it) for _ in range(n)], maxlen=n)
while True:
yield tuple(d)
d.append(next(it))
except StopIteration:
pass
This time only 1000 + n items are in memory at a time.
test:
a = [1.0, 1.5, 2.0, 2.5, 3.0]
print(list(n_wise(a)))
print(list(n_wise(a, 3)))
# [(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]
# [(1.0, 1.5, 2.0), (1.5, 2.0, 2.5), (2.0, 2.5, 3.0)]
As mentioned, the pairwise
recipe does overlapping pairs.
This recipe is also implemented in an external library, more_itertools
, among other helpful windowing tools :
import more_itertools as mit
a = [1.0, 1.5, 2.0, 2.5, 3.0]
list(mit.pairwise(a))
# [(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]
list(mit.windowed(a, n=2))
# [(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]
list(mit.stagger(a, offsets=(0, 1)))
# [(1.0, 1.5), (1.5, 2.0), (2.0, 2.5), (2.5, 3.0)]
Notice, with more_itertools.windowed
, you can control n
, the size of the sliding window (and even the amount of overlap via a step
parameter if needed). This tool may be useful in you exploration.
Install this library via > pip install more_itertools
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.