简体   繁体   English

在生成器内部返回生成器

[英]Returning a generator inside of a generator

I have a list of integers and and a list of tuples (representing intervals), I want to write a method that, for each tuple, returns the sublists of integers contained in the interval but I want to do it with generators.我有一个整数列表和一个元组列表(表示间隔),我想编写一个方法,为每个元组返回包含在间隔中的整数子列表,但我想用生成器来做。

For the following input:对于以下输入:

l = [1, 2, 3, 4, 5]
intervals = [(1, 2), (2, 4)]

the sublists should be: [1, 2] and [2, 3, 4] .子列表应该是: [1, 2][2, 3, 4]

My attempt:我的尝试:

def gen_intervals(l, intervals):
    for e in l:
        for i in intervals:
            if e > i[0] and e < i[1]:
                yield e

However, this would give me a list of elements because the generators yield one element at a time.但是,这会给我一个元素列表,因为生成器一次产生一个元素。 What I want is to yield a generator of the elements in that interval.我想要的是在那个区间产生一个元素的生成器

Then, I would use it like this:然后,我会这样使用它:

for interval in gen_intervals(l, intervals):
    for e in interval:
        print(e)

Important:重要的:

  • The list is sorted and so are the intervals (even though they may overlap).该列表已排序,间隔也是如此(即使它们可能重叠)。 The intervals follow the standard mathematical representation of an interval: [left endpoint, right endpoint] with left endpoint < right endpoint.区间遵循区间的标准数学表示:[左端点,右端点],左端点 < 右端点。 For any two intervals u, v, they cannot be a subset of the other.对于任何两个区间 u, v,它们不能是另一个的子集。 By sorted I mean that their right endpoints are sorted in ascending order.排序是指它们的正确端点按升序排序。

  • I really want to iterate over elements first and intervals later because the list of elements is likely to be very, very long, so I just want to iterate over that list only once.我真的想先迭代元素,然后再迭代间隔,因为元素列表可能非常非常长,所以我只想迭代该列表一次。 The length of the list of elements is >> the length of the list of the intervals but the exact lengths are arbitrary.元素列表的长度是>>区间列表的长度,但确切的长度是任意的。

You can simply use a generator comprehension on the yield line:您可以简单地在yield线上使用生成器理解:

def gen_intervals(elements, intervals):
    for vmin, vmax in intervals:
        yield (elm for elm in elements if (vmin <= elm <= vmax))

Which gives:这使:

l = [1, 2, 3, 4, 5]
intervals = [(2, 4), (1, 2)]

for interval in gen_intervals(l, intervals):
    for e in interval:
        print(e)
    print()
2
3
4

1
2

You can do that like this.你可以这样做。 Each interval generator has a buffer, when asked for items it flushes its buffer first and then picks the next list element.每个间隔生成器都有一个缓冲区,当被要求提供项目时,它首先刷新其缓冲区,然后选择下一个列表元素。 If this element happens to be "its" element, yield it, otherwise place it in the respective interval's buffer and try with the next element.如果这个元素恰好是“它的”元素,则让出它,否则将它放在相应区间的缓冲区中并尝试下一个元素。

def items_by_iterval(lst, intervals):
    list_iter = iter(lst)
    buffers = [[] for _ in intervals]

    def interval_iter(n):
        while True:
            if buffers[n]:
                yield from buffers[n]
                buffers[n] = []

            try:
                k = next(list_iter)
            except StopIteration:
                return

            for m, (a, b) in enumerate(intervals):
                if a <= k <= b:
                    if m == n:
                        yield k
                    else:
                        buffers[m].append(k)
                    break

    return [interval_iter(n) for n, _ in enumerate(intervals)]


##

lst = [1, 9, 2, 8, 3, 7, 4, 6, 5, 1, 9, 2, 8, 3, 7, 4, 6, 5, ]
intervals = [(5, 7), (2, 4), (8, 10)]

for ii in items_by_iterval(lst, intervals):
    for k in ii:
        print(k, end=' ')
    print()

This prints:这打印:

7 6 5 7 6 5
2 3 4 2 3 4
9 8 9 8

Instead of any for loops you could use map to create a list with True and False values for each element for each interval.代替任何for循环,您可以使用map为每个间隔的每个元素创建一个包含TrueFalse值的列表。 Then you could use itertools.compress to create an iterator object that yields the elements that are in respective interval:然后您可以使用itertools.compress创建一个迭代器 object ,该迭代器会产生相应区间内的元素:

import itertools

l = [1, 2, 3, 4, 5]
intervals = [(2, 4),]
for interval in intervals:
    # use range(i+1, j) to check if element is in interval
    mapped = map(lambda el: el in range(interval[0] + 1, interval[1]), l)
    # mapped = [False, False, True, False, False]
    res = itertools.compress(l, mapped)  # save this in case of more intervals

print(next(res))  # returns 3

Based on @paime's solution:基于@paime 的解决方案:

def iterate_interval(l, interval):
    for e in l:
        if e > interval[0]:
            if e < interval[1]:
                yield e
            else: break


def generate_windows(l, intervals):
    for interval in intervals:
        yield iterate_interval(l, interval)

This is more efficient because we do not need to continue iterating over the elements once an element is larger than the right endpoint of the interval.这更有效,因为一旦元素大于区间的右端点,我们就不需要继续迭代元素。

Probably faster than yours, as it does two quick binary searches to find the range instead of comparing all elements with the interval bounds.可能比你的更快,因为它会进行两次快速二进制搜索来查找范围,而不是将所有元素与区间界限进行比较。 Also, C is faster than Python.此外,C 比 Python 快。

from bisect import bisect_left, bisect_right
from itertools import islice

def generate_windows(l, intervals):
    for left, right in intervals:
        yield islice(
            l,
            bisect_left(l, left),
            bisect_right(l, right)
        )

l = [1, 2, 3, 4, 5]
intervals = [(1, 2), (2, 4)]

for window in generate_windows(l, intervals):
    print(*window)

Output ( Try it online! ): Output( 在线试用! ):

1 2
2 3 4

I have some other ideas that might be even faster, and usually I'd write a benchmark comparing the various solutions, but for that I'd need to know typical sizes, including average number of list elements per interval, and you didn't answer that.我还有其他一些可能更快的想法,通常我会编写一个基准来比较各种解决方案,但为此我需要知道典型大小,包括每个间隔的列表元素的平均数量,而你没有回答那个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM