简体   繁体   中英

Returning a generator inside of a generator

I have a list of integers and and a list of tuples (representing intervals), I want to write a method that, for each tuple, returns the sublists of integers contained in the interval but I want to do it with generators.

For the following input:

l = [1, 2, 3, 4, 5]
intervals = [(1, 2), (2, 4)]

the sublists should be: [1, 2] and [2, 3, 4] .

My attempt:

def gen_intervals(l, intervals):
    for e in l:
        for i in intervals:
            if e > i[0] and e < i[1]:
                yield e

However, this would give me a list of elements because the generators yield one element at a time. What I want is to yield a generator of the elements in that interval.

Then, I would use it like this:

for interval in gen_intervals(l, intervals):
    for e in interval:
        print(e)

Important:

  • The list is sorted and so are the intervals (even though they may overlap). The intervals follow the standard mathematical representation of an interval: [left endpoint, right endpoint] with left endpoint < right endpoint. For any two intervals u, v, they cannot be a subset of the other. By sorted I mean that their right endpoints are sorted in ascending order.

  • I really want to iterate over elements first and intervals later because the list of elements is likely to be very, very long, so I just want to iterate over that list only once. The length of the list of elements is >> the length of the list of the intervals but the exact lengths are arbitrary.

You can simply use a generator comprehension on the yield line:

def gen_intervals(elements, intervals):
    for vmin, vmax in intervals:
        yield (elm for elm in elements if (vmin <= elm <= vmax))

Which gives:

l = [1, 2, 3, 4, 5]
intervals = [(2, 4), (1, 2)]

for interval in gen_intervals(l, intervals):
    for e in interval:
        print(e)
    print()
2
3
4

1
2

You can do that like this. Each interval generator has a buffer, when asked for items it flushes its buffer first and then picks the next list element. If this element happens to be "its" element, yield it, otherwise place it in the respective interval's buffer and try with the next element.

def items_by_iterval(lst, intervals):
    list_iter = iter(lst)
    buffers = [[] for _ in intervals]

    def interval_iter(n):
        while True:
            if buffers[n]:
                yield from buffers[n]
                buffers[n] = []

            try:
                k = next(list_iter)
            except StopIteration:
                return

            for m, (a, b) in enumerate(intervals):
                if a <= k <= b:
                    if m == n:
                        yield k
                    else:
                        buffers[m].append(k)
                    break

    return [interval_iter(n) for n, _ in enumerate(intervals)]


##

lst = [1, 9, 2, 8, 3, 7, 4, 6, 5, 1, 9, 2, 8, 3, 7, 4, 6, 5, ]
intervals = [(5, 7), (2, 4), (8, 10)]

for ii in items_by_iterval(lst, intervals):
    for k in ii:
        print(k, end=' ')
    print()

This prints:

7 6 5 7 6 5
2 3 4 2 3 4
9 8 9 8

Instead of any for loops you could use map to create a list with True and False values for each element for each interval. Then you could use itertools.compress to create an iterator object that yields the elements that are in respective interval:

import itertools

l = [1, 2, 3, 4, 5]
intervals = [(2, 4),]
for interval in intervals:
    # use range(i+1, j) to check if element is in interval
    mapped = map(lambda el: el in range(interval[0] + 1, interval[1]), l)
    # mapped = [False, False, True, False, False]
    res = itertools.compress(l, mapped)  # save this in case of more intervals

print(next(res))  # returns 3

Based on @paime's solution:

def iterate_interval(l, interval):
    for e in l:
        if e > interval[0]:
            if e < interval[1]:
                yield e
            else: break


def generate_windows(l, intervals):
    for interval in intervals:
        yield iterate_interval(l, interval)

This is more efficient because we do not need to continue iterating over the elements once an element is larger than the right endpoint of the interval.

Probably faster than yours, as it does two quick binary searches to find the range instead of comparing all elements with the interval bounds. Also, C is faster than Python.

from bisect import bisect_left, bisect_right
from itertools import islice

def generate_windows(l, intervals):
    for left, right in intervals:
        yield islice(
            l,
            bisect_left(l, left),
            bisect_right(l, right)
        )

l = [1, 2, 3, 4, 5]
intervals = [(1, 2), (2, 4)]

for window in generate_windows(l, intervals):
    print(*window)

Output ( Try it online! ):

1 2
2 3 4

I have some other ideas that might be even faster, and usually I'd write a benchmark comparing the various solutions, but for that I'd need to know typical sizes, including average number of list elements per interval, and you didn't answer that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM