简体   繁体   English

如何用过滤器包裹发电机?

[英]How to wrap a generator with a filter?

I have a series of connected generators and I want to create a filter that can be used to wrap one of the generators.我有一系列连接的生成器,我想创建一个可用于包装其中一个生成器的过滤器。 This filter wrapper should take a generator and a function as parameters.这个过滤器包装器应该以一个生成器和一个 function 作为参数。 If a data item in the incoming stream does not pass the requirements of the filter, it should be passed downstream to the next generator without going through the wrapped generator.如果传入的 stream 中的某个数据项没有通过过滤器的要求,则应将其向下传递到下一个生成器,而无需经过包装生成器。 I have made a working example here that should make it more clear as to what I am trying to achieve:我在这里做了一个工作示例,应该可以更清楚地说明我要实现的目标:

import functools

is_less_than_three = lambda x : True if x < 3 else False

def add_one(numbers):
    print("new generator created")
    for number in numbers:
        yield number + 1

def wrapper(generator1, filter_):
    @functools.wraps(generator1)
    def wrapped(generator2):
        for data in generator2:
            if filter_(data):
                yield from generator1([data])
            else:
                yield data
    return wrapped

add_one_to_numbers_less_than_three = wrapper(add_one, is_less_than_three)
answers = add_one_to_numbers_less_than_three(range(6))
for answer in answers:
    print(answer)

#new generator created
#1
#new generator created
#2
#new generator created
#3
#3
#4
#5

The problem with this is that it requires creating a new generator for each data item.这样做的问题是它需要为每个数据项创建一个新的生成器。 There must be a better way?一定会有更好的办法? I have also tried using itertools.tee and splitting the generator, but this causes memory problems when the generators yield values at different rates (they do).我也尝试过使用 itertools.tee 并拆分生成器,但是当生成器以不同的速率(它们确实如此)产生值时,这会导致 memory 问题。 How can I accomplish what the above code does without re-creating generators and without causing memory problems?如何在不重新创建生成器且不引起 memory 问题的情况下完成上述代码的工作?

edited to add background information below编辑以在下面添加背景信息

As input I will receive large video streams.作为输入,我将收到大量视频流。 The video streams may or may not end (could be a webcam).视频流可能会或可能不会结束(可能是网络摄像头)。 Users are able to choose which image processing steps are carried out on the video frames, thus the order and number of functions will change.用户可以选择对视频帧执行哪些图像处理步骤,从而改变功能的顺序和数量。 Subsequently, the functions should be able to take each other's outputs as inputs.随后,功能应该能够将彼此的输出作为输入。

I have accomplished this by using a series of generators.我通过使用一系列生成器实现了这一点。 The input:output ratio of the generators/functions is variable - it could be 1:n, 1:1, or n:1 (for example, extracting several objects (subimages) from an image to be processed separately). input:output 生成器/函数的比率是可变的 - 它可以是 1:n、1:1 或 n:1(例如,从图像中提取多个对象(子图像)以分别处理)。

Currently these generators take a few parameters that are repeated among them (not DRY) and I am trying to decrease the number of parameters by refactoring them into separate generators or wrappers.目前,这些生成器采用一些在它们之间重复的参数(不是 DRY),我试图通过将它们重构为单独的生成器或包装器来减少参数的数量。 One of the more difficult ones is a filter on the data stream to determine whether or not a function should be applied to the frame (the function could be cpu-intensive and not needed on all frames).其中一个比较困难的是对数据 stream 的过滤器,以确定是否应将 function 应用到帧(function 可能是 CPU 密集型的,并不是所有帧都需要)。

The number of parameters makes the usage of the function more difficult for the user to understand.参数的数量使用户更难理解function的用法。 It also makes it more difficult for me in that whenever I want to make a change to one of the common parameters, I have to edit it for all functions.这也让我更加困难,因为每当我想更改其中一个常用参数时,我都必须针对所有功能对其进行编辑。

edit2 renamed function to generator in example code to make it more clear edit2 在示例代码中将 function 重命名为 generator 以使其更清晰

edit3 the solution Thank you @Blckknght. edit3 解决方案谢谢@Blckknght。 This can be solved by creating an infinite iterator that passes the value of a local variable to the generator.这可以通过创建一个将局部变量的值传递给生成器的无限迭代器来解决。 I modified my example slightly to change add_one to a 1:n generator instead of a 1:1 generator to show how this solution can also work for 1:n generators.我稍微修改了示例,将 add_one 更改为 1:n 生成器而不是 1:1 生成器,以展示此解决方案如何也适用于 1:n 生成器。

import functools

is_less_than_three = lambda x : True if x < 3 else False

def add_one(numbers):
    print("new generator created")
    for number in numbers:
        if number == 0:
            yield number - 1
            yield number
        else:
            yield number

def wrapper(generator1, filter_):
    @functools.wraps(generator1)
    def wrapped(generator2):
        local_variable_passer = generator1(iter(lambda: data, object()))
        for data in generator2:
            if filter_(data):
                next_data = next(local_variable_passer)
                if data == 0:
                    yield next_data
                    next_data = next(local_variable_passer)
                    yield next_data
                else:
                    yield next_data
            else:
                yield data
    return wrapped

add_one_to_numbers_less_than_three = wrapper(add_one, is_less_than_three)
answers = add_one_to_numbers_less_than_three(range(6))
for answer in answers:
    print(answer)

#new generator created
#-1
#0
#1
#2
#3
#3
#4
#5

The architecture is a conditional map - as such, each item must be mapped individually.该架构是有条件的 map - 因此,每个项目都必须单独映射。 This means the function should receive one number, not many numbers.这意味着function应该接收一个数字,而不是多个数字。


As long as there is a stateless 1:1 connection, use a function instead of a generator.只要存在无状态的 1:1 连接,请使用function而不是生成器。

def add_one(number):  # takes one number
    return number + 1  # provides one number

def conditional_map(function, condition):
    @functools.wraps(function)
    def wrapped(generator):
        return (
            function(item) if condition(item)
            else item for item in generator
        )
    return wrapped

for answer in conditional_map(add_one, lambda x: x < 3)(range(6)):
    print(answer)

If data must be passed to a stateful "generator", it is a coroutine and should be designed as such.如果必须将数据传递给有状态的“生成器”,那么它就是一个协程,应该这样设计。 This means that yield is used both to receive and provide data.这意味着yield用于接收和提供数据。

from itertools import count

def add_increment(start=0):
    # initially receive data
    number = yield
    for increment in count(start):
        # provide and receive data
        number = yield number + increment

Since this is still a 1:1 connection, it can be used with the previous conditional_map .由于这仍然是 1:1 连接,因此可以与之前的conditional_map一起使用。

mapper = add_increment()
next(mapper)  # prime the coroutine - this could be done with a decorator

for answer in conditional_map(mapper.send, lambda x: x < 3)(range(6)):
    print(answer)

If 1:n connections are needed, expect to receive a generator for each input.如果需要 1:n 连接,则期望为每个输入接收一个生成器。

def add_some(number):  # takes one number
    yield number - 1
    yield number
    yield number + 1

def conditional_map(function, condition):
    @functools.wraps(function)
    def wrapped(generator):
        for data in generator:
            if filter_(data):
                yield from function(data)  # passes one *one* item
            else:
                yield data
    return wrapped

If a stateful 1:n connection is required, a coroutine that produces a generator/iterable can be used.如果需要有状态的 1:n 连接,则可以使用生成生成器/可迭代的协程。

def add_increments(start=0):
    # initially receive data
    number = yield
    for increment in count(start):
        # provide and receive data
        number = yield (number + increment + i for i in (-1, 0, 1))

As I understand your problem, you have a stream of video frames, and you're trying to create a pipeline of processing functions that modify the stream.据我了解您的问题,您有一个 stream 视频帧,并且您正在尝试创建修改 stream 的处理功能管道。 Different processing functions might change the number of frames, so a single input frame could result in multiple output frames, or multiple input frames could be consumed before a single output frame is produced.不同的处理函数可能会改变帧数,因此单个输入帧可能会导致多个 output 帧,或者在生成单个 output 帧之前可能会消耗多个输入帧。 Some functions might be 1:1, but that's not something you can count on.有些功能可能是 1:1,但这不是您可以指望的。

Your current implementation uses generator functions for all the processing.您当前的实现使用生成器函数进行所有处理。 The output function iterates on the chain, and each processing step in the pipeline requests frames from the one before it using iteration. output function 在链上迭代,管道中的每个处理步骤都使用迭代从它之前的一个处理步骤请求帧。

The function you're trying to write right now is a sort of selective bypass.您现在尝试编写的 function 是一种选择性绕过。 You want for some frames (those meeting some condition) to get passed in to some already existing generator function, but other frames to skip over the processing and just go directly into the output.您希望某些帧(满足某些条件的帧)传递到一些已经存在的生成器 function,但其他帧跳过处理,只需将 go 直接传递到 Z78E6221F6393D1356681DB398F1。 Unfortunately, that's probably not possible to do with Python generators.不幸的是,这可能与 Python 生成器无关。 The iteration protocol is just not sophisticated enough to support it.迭代协议不够复杂,不足以支持它。

First off, it is possible to do this for 1:1 with generators, but you can't easily generalize to n :1 or 1: n cases.首先,可以使用生成器以 1:1 的比例执行此操作,但您不能轻易推广到n :1 或 1: n的情况。 Here's what it might look like for 1:1:这是 1:1 的样子:

def selective_processing_1to1(processing_func, condition, input_iterable):
    processing_iterator = processing_func(iter(lambda: input_value, object()))
    for input_value in input_iterator:
        if condition(input_value):
            yield next(processing_iterator)
        else:
            yield input_value

There's a lot of work being done in the processing_iterator creation step.processing_iterator创建步骤中有很多工作要做。 By using the two-argument form of iter with a lambda function and a sentinel object (that will never be yielded), I'm creating an infinite iterator that always yields the current value of the local variable input_value .通过使用带有lambda function 和哨兵iter (永远不会产生)的迭代器的两个参数形式,I'm 的迭代器产生一个无限的input_value变量,它总是产生一个局部变量。 Then I pass that iterator it to the processing_func function.然后我将该迭代器传递给processing_func function。 I can selectively call next on the generator object if I want to apply the processing the filter represents to the current value, or I can just yield the value myself without processing it.如果我想将过滤器表示的处理应用于当前值,我可以选择性地在生成器 object 上调用next ,或者我可以自己生成值而不对其进行处理。

But because this only works on one frame at a time, it won't do for n :1 or 1: n filters (and I don't even want to think about m : n kinds of scenarios).但是因为这一次只适用于一帧,所以它不适用于n :1 或 1: n过滤器(我什至不想考虑m : n种场景)。

A "peekable" iterator that lets you see what the next value is going to be before you iterate onto it might let you support a limited form of selective filtering for n :1 processes (that is, where a possibly-variable n input frames go into one output frame).一个“可窥视”的迭代器可以让您在迭代之前查看下一个值将是什么,这可能让您支持有限形式的选择性过滤n :1 进程(即,可能变量n输入帧 go到一个 output 框架)。 The limitation is that you can only do the selective filtering on the first of the n frames that is going to be consumed by the processing, the others will get taken without you getting a chance to check them first.限制是您只能对将要被处理消耗的n帧中的第一个进行选择性过滤,其他帧将在您没有机会首先检查它们的情况下被拍摄。 Maybe that's good enough?也许这已经足够好了?

Anyway, here's what that looks like:无论如何,这就是它的样子:

_sentinel = object()
class PeekableIterator:
    def __init__(self, input_iterable):
        self.iterator = iter(input_iterable)
        self.next_value = next(self.iterator, _sentinel)

    def __iter__(self):
        return self

    def __next__(self):
        if self.next_value != _sentinel:
            return_value = self.next_value
            self.next_value = next(self.iterator, _sentinel)
            return return_value
        raise StopIteration

    def peek(self):                 # this is not part of the iteration protocol!
        if self.next_value != _sentinel:
            return self.next_value
        raise ValueError("input exhausted")

def selective_processing_Nto1(processing_func, condition, input_iterable):
    peekable = PeekableIterator(input_iterable)
    processing_iter = processing_func(peekable)
    while True:
        try:
            value = peekable.peek()
            print(value, condition(value))
        except ValueError:
            return
        try:
            yield next(processing_iter) if condition(value) else next(peekable)
        except StopIteration:
            return

This is as good as we can practically do when the processing function is a generator.当处理 function 是一个生成器时,我们实际上可以做到这一点。 If we wanted to do more, such as supporting 1: n processing, we'd need some way to know how large the n was going to be, so we could get that many values before deciding if we will pass on the next input value or not.如果我们想做更多的事情,比如支持 1: n处理,我们需要一些方法来知道n将有多大,这样我们就可以在决定是否传递下一个输入值之前获得那么多值或不。 While you could write a custom class for the processing that would report that, it is probably less convenient than just calling the processing function repeatedly as you do in the question.虽然您可以编写自定义 class 来进行报告处理,但它可能不如像您在问题中那样重复调用处理 function 方便。

It really seems like you all are making this too complicated.看来你们都把事情搞得太复杂了。 If you think of a data processing pipeline as如果您将数据处理管道视为

    source -> transform -> filter -> sink

where source, transform, filter are all generators.其中 source、transform、filter 都是生成器。 this is similar to Unix pipelines这类似于 Unix 管道

    cat f | tr 'a' 'A' | grep 'word' > /dev/null

then you can see how a pipeline works (conceptually).然后你可以看到管道是如何工作的(概念上)。 One big difference is that Unix pipelines push data, where with Python generators you pull data.一个很大的区别是 Unix 管道推送数据,而 Python 生成器则拉取数据。

Using some of your functions:使用你的一些功能:

# this is a source
def add_one(numbers):
    print("new generator created")
    # this is the output that becomes the next function's input
    for number in numbers:
        yield number + 1  

# this is a transform
def filter(input, predicate):
    for item in input:
       if predicate(item):
            yield item

# this is the sink
def save(input, filename):
    with open(filename, 'w') as f:
        for item in input:
            f.write(item)

To put the pipeline of generators together in python you start with the source, then pass it to a transform or filter as a parameter that can be iterated over.要在 python 中将生成器管道放在一起,您从源开始,然后将其作为可以迭代的参数传递给转换或过滤器。 Of course each of the generators has a yield statement.当然,每个生成器都有一个 yield 语句。 Finally the outermost function is the sink and it consumes the values while it iterates.最后,最外面的 function 是接收器,它在迭代时消耗这些值。 It looks like this.看起来像这样。 You can see how the predicate function is passed to the filter function in addition to the "source" of its data.除了其数据的“来源”之外,您还可以看到谓词 function 是如何传递给过滤器 function 的。

# now run the pipeline
save(filter(add_one(range(20), is_less_than_three), 'myfile')

Some find that this looks awkward but if you think of mathematical notation it is easier.有些人觉得这看起来很尴尬,但如果你想到数学符号,那就更容易了。 I am sure you have seen f(g(x)) which is exactly the same notation.我相信你已经看到了f(g(x)) ,它是完全相同的符号。 You could also write it as:你也可以把它写成:

save(filter(add_one(range(20), 
            is_less_than_three), 
     'myfile')

which shows better how the parameters are used.这更好地显示了参数的使用方式。

To recap The pipeline is a generator.回顾一下管道是一个生成器。 In this case it won't have a generator as its source.在这种情况下,它不会有一个生成器作为它的来源。 It may have non-generator input such as a list of numbers (your example), or create them some other way such as reading a file.它可能有非生成器输入,例如数字列表(您的示例),或者以其他方式创建它们,例如读取文件。

Transform generators always have a generator for their source, and they yield output to their "sink".转换生成器总是有一个生成器作为它们的源,并且它们将 output 产生到它们的“接收器”。 In other words a transform acts like a sink to its source and like a source to its sink.换句话说,转换对于它的源就像一个接收器,对于它的接收器就像一个源。

The sink is final part of a pipeline is a sink that just iterates over its source input.接收器是管道的最后一部分,它只是迭代其源输入的接收器。 It consumes its input and doesn't yield any output. Its job is to consume the items, by processing, saving printing or whatever.它消耗其输入并且不产生任何 output。它的工作是通过处理、保存打印或其他方式消耗项目。

A transform is am to n function that for m inputs produces n outputs, meaning it can filter out some inputs and not pass them, or product multiple outputs by creating new items.一个转换是 n function,对于 m 个输入产生 n 个输出,这意味着它可以过滤掉一些输入而不传递它们,或者通过创建新项目来产生多个输出。 An example might be transforming a video stream from 60fps to 30fps.一个示例可能是将视频 stream 从 60fps 转换为 30fps。 For every two input frames it produces one output frame.对于每两个输入帧,它会产生一个 output 帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM