在生成器内部返回生成器

Question

I have a list of integers and and a list of tuples (representing intervals), I want to write a method that, for each tuple, returns the sublists of integers contained in the interval but I want to do it with generators.我有一个整数列表和一个元组列表（表示间隔），我想编写一个方法，为每个元组返回包含在间隔中的整数子列表，但我想用生成器来做。

For the following input:对于以下输入：

l = [1, 2, 3, 4, 5]
intervals = [(1, 2), (2, 4)]

the sublists should be: [1, 2] and [2, 3, 4] .子列表应该是： [1, 2]和[2, 3, 4] 。

My attempt:我的尝试：

def gen_intervals(l, intervals):
    for e in l:
        for i in intervals:
            if e > i[0] and e < i[1]:
                yield e

However, this would give me a list of elements because the generators yield one element at a time.但是，这会给我一个元素列表，因为生成器一次产生一个元素。 What I want is to yield a generator of the elements in that interval.我想要的是在那个区间产生一个元素的生成器。

Then, I would use it like this:然后，我会这样使用它：

for interval in gen_intervals(l, intervals):
    for e in interval:
        print(e)

Important:重要的：

The list is sorted and so are the intervals (even though they may overlap).该列表已排序，间隔也是如此（即使它们可能重叠）。 The intervals follow the standard mathematical representation of an interval: [left endpoint, right endpoint] with left endpoint < right endpoint.区间遵循区间的标准数学表示：[左端点，右端点]，左端点 < 右端点。 For any two intervals u, v, they cannot be a subset of the other.对于任何两个区间 u, v，它们不能是另一个的子集。 By sorted I mean that their right endpoints are sorted in ascending order.排序是指它们的正确端点按升序排序。
I really want to iterate over elements first and intervals later because the list of elements is likely to be very, very long, so I just want to iterate over that list only once.我真的想先迭代元素，然后再迭代间隔，因为元素列表可能非常非常长，所以我只想迭代该列表一次。 The length of the list of elements is >> the length of the list of the intervals but the exact lengths are arbitrary.元素列表的长度是>>区间列表的长度，但确切的长度是任意的。

Answer 1

You can simply use a generator comprehension on the yield line:您可以简单地在yield线上使用生成器理解：

def gen_intervals(elements, intervals):
    for vmin, vmax in intervals:
        yield (elm for elm in elements if (vmin <= elm <= vmax))

Which gives:这使：

l = [1, 2, 3, 4, 5]
intervals = [(2, 4), (1, 2)]

for interval in gen_intervals(l, intervals):
    for e in interval:
        print(e)
    print()

Answer 2

You can do that like this.你可以这样做。 Each interval generator has a buffer, when asked for items it flushes its buffer first and then picks the next list element.每个间隔生成器都有一个缓冲区，当被要求提供项目时，它首先刷新其缓冲区，然后选择下一个列表元素。 If this element happens to be "its" element, yield it, otherwise place it in the respective interval's buffer and try with the next element.如果这个元素恰好是“它的”元素，则让出它，否则将它放在相应区间的缓冲区中并尝试下一个元素。

def items_by_iterval(lst, intervals):
    list_iter = iter(lst)
    buffers = [[] for _ in intervals]

    def interval_iter(n):
        while True:
            if buffers[n]:
                yield from buffers[n]
                buffers[n] = []

            try:
                k = next(list_iter)
            except StopIteration:
                return

            for m, (a, b) in enumerate(intervals):
                if a <= k <= b:
                    if m == n:
                        yield k
                    else:
                        buffers[m].append(k)
                    break

    return [interval_iter(n) for n, _ in enumerate(intervals)]


##

lst = [1, 9, 2, 8, 3, 7, 4, 6, 5, 1, 9, 2, 8, 3, 7, 4, 6, 5, ]
intervals = [(5, 7), (2, 4), (8, 10)]

for ii in items_by_iterval(lst, intervals):
    for k in ii:
        print(k, end=' ')
    print()

This prints:这打印：

7 6 5 7 6 5
2 3 4 2 3 4
9 8 9 8

Answer 3

Instead of any for loops you could use map to create a list with True and False values for each element for each interval.代替任何for循环，您可以使用map为每个间隔的每个元素创建一个包含True和False值的列表。 Then you could use itertools.compress to create an iterator object that yields the elements that are in respective interval:然后您可以使用itertools.compress创建一个迭代器 object ，该迭代器会产生相应区间内的元素：

import itertools

l = [1, 2, 3, 4, 5]
intervals = [(2, 4),]
for interval in intervals:
    # use range(i+1, j) to check if element is in interval
    mapped = map(lambda el: el in range(interval[0] + 1, interval[1]), l)
    # mapped = [False, False, True, False, False]
    res = itertools.compress(l, mapped)  # save this in case of more intervals

print(next(res))  # returns 3

Answer 4

Based on @paime's solution:基于@paime 的解决方案：

def iterate_interval(l, interval):
    for e in l:
        if e > interval[0]:
            if e < interval[1]:
                yield e
            else: break


def generate_windows(l, intervals):
    for interval in intervals:
        yield iterate_interval(l, interval)

This is more efficient because we do not need to continue iterating over the elements once an element is larger than the right endpoint of the interval.这更有效，因为一旦元素大于区间的右端点，我们就不需要继续迭代元素。

Answer 5

Probably faster than yours, as it does two quick binary searches to find the range instead of comparing all elements with the interval bounds.可能比你的更快，因为它会进行两次快速二进制搜索来查找范围，而不是将所有元素与区间界限进行比较。 Also, C is faster than Python.此外，C 比 Python 快。

from bisect import bisect_left, bisect_right
from itertools import islice

def generate_windows(l, intervals):
    for left, right in intervals:
        yield islice(
            l,
            bisect_left(l, left),
            bisect_right(l, right)
        )

l = [1, 2, 3, 4, 5]
intervals = [(1, 2), (2, 4)]

for window in generate_windows(l, intervals):
    print(*window)

Output ( Try it online! ): Output（在线试用！）：

1 2
2 3 4

I have some other ideas that might be even faster, and usually I'd write a benchmark comparing the various solutions, but for that I'd need to know typical sizes, including average number of list elements per interval, and you didn't answer that.我还有其他一些可能更快的想法，通常我会编写一个基准来比较各种解决方案，但为此我需要知道典型大小，包括每个间隔的列表元素的平均数量，而你没有回答那个。

在生成器内部返回生成器

问题描述

5 个解决方案

解决方案1
1 2022-08-22 09:29:54

解决方案2
0 2022-08-22 08:54:17

解决方案3
0 2022-08-22 09:01:20

解决方案4
0 2022-08-23 12:33:57

解决方案5
0 2022-08-23 22:02:31

在生成器内部返回生成器

问题描述

5 个解决方案

解决方案1 1 2022-08-22 09:29:54

解决方案2 0 2022-08-22 08:54:17

解决方案3 0 2022-08-22 09:01:20

解决方案4 0 2022-08-23 12:33:57

解决方案5 0 2022-08-23 22:02:31

解决方案1
1 2022-08-22 09:29:54

解决方案2
0 2022-08-22 08:54:17

解决方案3
0 2022-08-22 09:01:20

解决方案4
0 2022-08-23 12:33:57

解决方案5
0 2022-08-23 22:02:31