简体   繁体   English

如何同时执行两个“聚合”函数(如 sum),并从同一个迭代器提供它们?

[英]How to execute two “aggregate” functions (like sum) concurrently, feeding them from the same iterator?

Imagine we have an iterator, say iter(range(1, 1000)) .假设我们有一个迭代器,比如iter(range(1, 1000)) And we have two functions, each accepting an iterator as the only parameter, say sum() and max() .我们有两个函数,每个函数都接受一个迭代器作为唯一的参数,比如sum()max() In SQL world we would call them aggregate functions.在 SQL 世界中,我们将它们称为聚合函数。

Is there any way to obtain results of both without buffering the iterator output?有没有办法在不缓冲迭代器输出的情况下获得两者的结果?

To do it, we would need to pause and resume aggregate function execution, in order to feed them both with the same values without storing them.为此,我们需要暂停和恢复聚合函数的执行,以便在不存储它们的情况下为它们提供相同的值。 Maybe is there a way to express it using async things without sleeps?也许有没有办法在没有睡眠的情况下使用异步事物来表达它?

Let's consider how to apply two aggregate functions to the same iterator, which we can only exhaust once.让我们考虑如何将两个聚合函数应用于同一个迭代器,我们只能用一次。 The initial attempt (which hardcodes sum and max for brevity, but is trivially generalizable to an arbitrary number of aggregate functions) might look like this:最初的尝试(为了简洁起见对summax进行硬编码,但可以简单地推广到任意数量的聚合函数)可能如下所示:

def max_and_sum_buffer(it):
    content = list(it)
    p = sum(content)
    m = max(content)
    return p, m

This implementation has the downside that it stores all the generated elements in memory at once, despite both functions being perfectly capable of stream processing.这种实现的缺点是它一次将所有生成的元素存储在内存中,尽管这两个函数都具有完美的流处理能力。 The question anticipates this cop-out and explicitly requests the result to be produced without buffering the iterator output.该问题预测了这种 cop-out,并明确要求在不缓冲迭代器输出的情况下生成结果。 Is it possible to do this?是否有可能做到这一点?

Serial execution: itertools.tee串行执行:itertools.tee

It certainly seems possible.似乎是可能的。 After all, Python iterators are external , so every iterator is already capable of suspending itself.毕竟,Python 迭代器是外部的,所以每个迭代器都已经能够挂起自己。 How hard can it be to provide an adapter that splits an iterator into two new iterators that provide the same content?提供一个将迭代器拆分为两个提供相同内容的新迭代器的适配器有多难? Indeed, this is exactly the description of itertools.tee , which appears perfectly suited to parallel iteration:事实上,这正是itertools.tee的描述,它看起来非常适合并行迭代:

def max_and_sum_tee(it):
    it1, it2 = itertools.tee(it)
    p = sum(it1)  # XXX
    m = max(it2)
    return p, m

The above produces the correct result, but doesn't work the way we'd like it to.以上产生了正确的结果,但没有按照我们希望的方式工作。 The trouble is that we're not iterating in parallel.问题是我们不是并行迭代。 Aggregate functions like sum and max never suspend - each insists on consuming all of the iterator content before producing the result.summax这样的聚合函数永远不会挂起——每个函数都坚持在产生结果之前消耗所有的迭代器内容。 So sum will exhaust it1 before max has had a chance to run at all.所以sum将在max有机会运行之前耗尽it1 Exhausting elements of it1 while leaving it2 alone will cause those elements to be accumulated inside an internal FIFO shared between the two iterators.耗尽it1元素而留下it2将导致这些元素在两个迭代器之间共享的内部 FIFO 中累积。 That's unavoidable here - since max(it2) must see the same elements, tee has no choice but to accumulate them.这是不可避免的 - 因为max(it2)必须看到相同的元素,所以tee除了累积它们别无选择。 (For more interesting details on tee , refer to this post. ) (有关tee更多有趣细节,请参阅这篇文章。

In other words, there is no difference between this implementation and the first one, except that the first one at least makes the buffering explicit.换句话说,这个实现和第一个没有区别,除了第一个至少使缓冲显式。 To eliminate buffering, sum and max must run in parallel, not one after the other.为了消除缓冲, summax必须并行运行,而不是一个接一个。

Threads: concurrent.futures线程:concurrent.futures

Let's see what happens if we run the aggregate functions in separate threads, still using tee to duplicate the original iterator:让我们看看如果我们在单独的线程中运行聚合函数会发生什么,仍然使用tee复制原始迭代器:

def max_and_sum_threads_simple(it):
    it1, it2 = itertools.tee(it)

    with concurrent.futures.ThreadPoolExecutor(2) as executor:
        sum_future = executor.submit(lambda: sum(it1))
        max_future = executor.submit(lambda: max(it2))

    return sum_future.result(), max_future.result()

Now sum and max actually run in parallel (as much as the GIL permits), threads being managed by the excellentconcurrent.futures module.现在summax实际上并行运行(在 GIL允许的范围内),线程由优秀的concurrent.futures模块管理。 It has a fatal flaw, however: for tee not to buffer the data, sum and max must process their items at exactly the same rate.然而,它有一个致命的缺陷:为了让tee不缓冲数据, summax必须以完全相同的速率处理它们的项目。 If one is even a little bit faster than the other, they will drift apart, and tee will buffer all intermediate elements.如果一个比另一个快一点,它们就会分开,并且tee将缓冲所有中间元素。 Since there is no way to predict how fast each will run, the amount of buffering is both unpredictable and has the nasty worst case of buffering everything.由于无法预测每个运行的速度,因此缓冲量既不可预测,又具有缓冲所有内容的最坏情况。

To ensure that no buffering occurs, tee must be replaced with a custom generator that buffers nothing and blocks until all the consumers have observed the previous value before proceeding to the next one.为确保不发生缓冲,必须将tee替换为自定义生成器,该生成器不缓冲任何内容并阻塞,直到所有消费者都观察到前一个值,然后再继续下一个。 As before, each consumer runs in its own thread, but now the calling thread is busy running a producer, a loop that actually iterates over the source iterator and signals that a new value is available.和以前一样,每个消费者都在自己的线程中运行,但现在调用线程正忙于运行生产者,这是一个实际迭代源迭代器并发出新值可用信号的循环。 Here is an implementation:这是一个实现:

def max_and_sum_threads(it):
    STOP = object()
    next_val = None
    consumed = threading.Barrier(2 + 1)  # 2 consumers + 1 producer
    val_id = 0
    got_val = threading.Condition()

    def send(val):
        nonlocal next_val, val_id
        consumed.wait()
        with got_val:
            next_val = val
            val_id += 1
            got_val.notify_all()

    def produce():
        for elem in it:
            send(elem)
        send(STOP)

    def consume():
        last_val_id = -1
        while True:
            consumed.wait()
            with got_val:
                got_val.wait_for(lambda: val_id != last_val_id)
            if next_val is STOP:
                return
            yield next_val
            last_val_id = val_id

    with concurrent.futures.ThreadPoolExecutor(2) as executor:
        sum_future = executor.submit(lambda: sum(consume()))
        max_future = executor.submit(lambda: max(consume()))
        produce()

    return sum_future.result(), max_future.result()

This is quite some amount of code for something so simple conceptually, but it is necessary for correct operation.对于概念上如此简单的事情来说,这是相当多的代码,但它是正确操作所必需的。

produce() loops over the outside iterator and sends the items to the consumers, one value at a time. produce()遍历外部迭代器并将项目发送给消费者,一次一个值。 It uses a barrier , a convenient synchronization primitive added in Python 3.2, to wait until all consumers are done with the old value before overwriting it with the new one in next_val .它使用了一个barrier ,一种在 Python 3.2 中添加的方便的同步原语,等待所有消费者完成旧值,然后用next_val的新值覆盖它。 Once the new value is actually ready, a condition is broadcast.一旦新值实际准备好,就会广播一个条件 consume() is a generator that transmits the produced values as they arrive, until detecting STOP . consume()是一个生成器,它在产生的值到达时传输它们,直到检测到STOP The code can be generalized run any number of aggregate functions in parallel by creating consumers in a loop, and adjusting their number when creating the barrier.通过在循环中创建使用者并在创建屏障时调整它们的数量,可以将代码概括为并行运行任意数量的聚合函数。

The downside of this implementation is that it requires creation of threads (possibly alleviated by making the thread pool global) and a lot of very careful synchronization at each iteration pass.这种实现的缺点是它需要创建线程(可能通过使线程池全局化来缓解)以及在每次迭代时进行大量非常仔细的同步。 This synchronization destroys performance - this version is almost 2000 times slower than the single-threaded tee , and 475 times slower than the simple but non-deterministic threaded version.这种同步会破坏性能——这个版本比单线程tee慢近 2000 倍,比简单但不确定的线程版本慢 475 倍。

Still, as long as threads are used, there is no avoiding synchronization in some form.尽管如此,只要使用线程,就无法避免某种形式的同步。 To completely eliminate synchronization, we must abandon threads and switch to cooperative multi-tasking.要完全消除同步,我们必须放弃线程并切换到协作多任务。 The question is is it possible to suspend execution of ordinary synchronous functions like sum and max in order to switch between them?问题是是否可以暂停执行summax等普通同步函数以在它们之间切换?

Fibers: greenlet纤维:greenlet

It turns out that the greenlet third-party extension module enables exactly that.事实证明, greenlet第三方扩展模块正是如此。 Greenlets are an implementation of fibers , lightweight micro-threads that switch between each other explicitly. Greenlets 是纤程的实现,轻量级微线程可以显式地在彼此之间切换。 This is sort of like Python generators, which use yield to suspend, except greenlets offer a much more flexible suspension mechanism, allowing one to choose who to suspend to .这有点像 Python 生成器,它使用yield来暂停,除了 greenlets 提供了一种更灵活的暂停机制,允许人们选择暂停谁。

This makes it fairly easy to port the threaded version of max_and_sum to greenlets:这使得将max_and_sum的线程版本max_and_sum到 greenlets 变得相当容易:

def max_and_sum_greenlet(it):
    STOP = object()
    consumers = None

    def send(val):
        for g in consumers:
            g.switch(val)

    def produce():
        for elem in it:
            send(elem)
        send(STOP)

    def consume():
        g_produce = greenlet.getcurrent().parent
        while True:
            val = g_produce.switch()
            if val is STOP:
                return
            yield val

    sum_result = []
    max_result = []
    gsum = greenlet.greenlet(lambda: sum_result.append(sum(consume())))
    gsum.switch()
    gmax = greenlet.greenlet(lambda: max_result.append(max(consume())))
    gmax.switch()
    consumers = (gsum, gmax)
    produce()

    return sum_result[0], max_result[0]

The logic is the same, but with less code.逻辑是一样的,但代码更少。 As before, produce produces values retrieved from the source iterator, but its send doesn't bother with synchronization, as it doesn't need to when everything is single-threaded.和以前一样, produce生成从源迭代器检索的值,但它的send不关心同步,因为当一切都是单线程时它不需要。 Instead, it explicitly switches to every consumer in turn to do its thing, with the consumer dutifully switching right back.相反,它明确地依次切换到每个消费者来做它的事情,消费者尽职尽责地立即切换回来。 After going through all consumers, the producer is ready for the next iteration pass.在经过所有消费者之后,生产者准备好进行下一次迭代。

Results are retrieved using an intermediate single-element list because greenlet doesn't provide access to the return value of the target function (and neither doesthreading.Thread , which is why we opted for concurrent.futures above).结果是使用一个中间单元素列表中检索,因为greenlet不提供访问目标函数的返回值(和同样没有threading.Thread ,这就是为什么我们选择了concurrent.futures上文)。

There are downsides to using greenlets, though.不过,使用 greenlets 也有缺点。 First, they don't come with the standard library, you need to install the greenlet extension.首先,它们不附带标准库,您需要安装 greenlet 扩展。 Then, greenlet is inherently non-portable because the stack-switching code is not supported by the OS and the compiler and can be considered somewhat of a hack (although an extremely clever one).然后,greenlet 本质上是不可移植的,因为操作系统和编译器不支持堆栈切换代码,可以将其视为某种黑客(尽管非常聪明)。 A Python targeting WebAssembly or JVM orGraalVM would be very unlikely to support greenlet.面向WebAssemblyJVMGraalVM 的Python 不太可能支持 greenlet。 This is not a pressing issue, but it's definitely something to keep in mind for the long haul.这不是一个紧迫的问题,但从长远来看,这绝对是需要牢记的。

Coroutines: asyncio协程:asyncio

As of Python 3.5, Python provides native coroutines.从 Python 3.5 开始,Python 提供了原生协程。 Unlike greenlets, and similar to generators, coroutines are distinct from regular functions and must be defined using async def .与 greenlets 和生成器类似,协程与常规函数不同,必须使用async def定义。 Coroutines can't be easily executed from synchronous code, they must instead be processed by a scheduler which drives them to completion.协程不能轻易地从同步代码中执行,它们必须由驱动它们完成的调度程序处理。 The scheduler is also known as an event loop because its other job is to receive IO events and pass them to appropriate callbacks and coroutines.调度器也被称为事件循环,因为它的另一个工作是接收 IO 事件并将它们传递给适当的回调和协程。 In the standard library, this is the role of the asyncio module.在标准库中,这是asyncio模块的作用。

Before implementing an asyncio-based max_and_sum , we must first resolve a hurdle.在实现基于max_and_sum的 max_and_sum 之前,我们必须首先解决一个障碍。 Unlike greenlet, asyncio is only able to suspend execution of coroutines, not of arbitrary functions.与 greenlet 不同,asyncio 只能暂停协程的执行,而不能暂停任意函数的执行。 So we need to replace sum and max with coroutines that do essentially the same thing.所以我们需要用基本上做相同事情的协程替换summax This is as simple as implementing them in the obvious way, only replacing for with async for , enabling the async iterator to suspend the coroutine while waiting for the next value to arrive:这就像以明显的方式实现它们一样简单,只需将for替换for async for ,使异步迭代器能够在等待下一个值到达时暂停协程:

async def asum(it):
    s = 0
    async for elem in it:
        s += elem
    return s

async def amax(it):
    NONE_YET = object()
    largest = NONE_YET
    async for elem in it:
        if largest is NONE_YET or elem > largest:
            largest = elem
    if largest is NONE_YET:
        raise ValueError("amax() arg is an empty sequence")
    return largest

# or, using https://github.com/vxgmichel/aiostream
#
#from aiostream.stream import accumulate
#def asum(it):
#    return accumulate(it, initializer=0)
#def amax(it):
#    return accumulate(it, max)

One could reasonably ask if providing a new pair of aggregate functions is cheating;人们可以合理地问,提供一对新的聚合函数是否是作弊; after all, the previous solutions were careful to use existing sum and max built-ins.毕竟,以前的解决方案小心使用现有的summax内置函数。 The answer will depend on the exact interpretation of the question, but I would argue that the new functions are allowed because they are in no way specific to the task at hand.答案将取决于对问题的确切解释,但我认为允许使用新功能,因为它们绝不特定于手头的任务。 They do the exact same thing the built-ins do, but consuming async iterators.它们执行与内置函数完全相同的操作,但使用异步迭代器。 I suspect that the only reason such functions don't already exist somewhere in the standard library is due to coroutines and async iterators being a relatively new feature.我怀疑这样的函数在标准库中不存在的唯一原因是协程和异步迭代器是一个相对较新的特性。

With that out of the way, we can proceed to write max_and_sum as a coroutine:有了这个,我们可以继续将max_and_sum编写为协程:

async def max_and_sum_asyncio(it):
    loop = asyncio.get_event_loop()
    STOP = object()

    next_val = loop.create_future()
    consumed = loop.create_future()
    used_cnt = 2  # number of consumers

    async def produce():
        for elem in it:
            next_val.set_result(elem)
            await consumed
        next_val.set_result(STOP)

    async def consume():
        nonlocal next_val, consumed, used_cnt
        while True:
            val = await next_val
            if val is STOP:
                return
            yield val
            used_cnt -= 1
            if not used_cnt:
                consumed.set_result(None)
                consumed = loop.create_future()
                next_val = loop.create_future()
                used_cnt = 2
            else:
                await consumed

    s, m, _ = await asyncio.gather(asum(consume()), amax(consume()),
                                   produce())
    return s, m

Although this version is based on switching between coroutines inside a single thread, just like the one using greenlet, it looks different.虽然这个版本是基于单线程内协程之间的切换,但是和使用greenlet的版本一样,看起来不一样。 asyncio doesn't provide explicit switching of coroutines, it bases task switching on the await suspension/resumption primitive. asyncio 不提供协程的显式切换,它基于await暂停/恢复原语进行任务切换。 The target of await can be another coroutine, but also an abstract "future", a value placeholder which will be filled in later by some other coroutine. await的目标可以是另一个协程,也可以是一个抽象的“未来”,一个值占位符,稍后会被其他协程填充。 Once the awaited value becomes available, the event loop automatically resumes execution of the coroutine, with the await expression evaluating to the provided value.一旦等待的值变为可用,事件循环会自动恢复协程的执行, await表达式计算为提供的值。 So instead of produce switching to consumers, it suspends itself by awaiting a future that will arrive once all the consumers have observed the produced value.因此,它不是将produce切换到消费者,而是通过等待所有消费者都观察到生产价值后即将到来的未来来暂停自己。

consume() is an asynchronous generator , which is like an ordinary generator, except it creates an async iterator, which our aggregate coroutines are already prepared to accept by using async for . consume()是一个异步生成器,它就像一个普通的生成器,除了它创建一个异步迭代器,我们的聚合协程已经准备好通过使用async for来接受它。 An async iterator's equivalent of __next__ is called __anext__ and is a coroutine, allowing the coroutine that exhausts the async iterator to suspend while waiting for the new value to arrive.异步迭代器中的等价物__next__是叫__anext__ ,是一个协程,使得协程是耗尽了异步迭代器在等待新的值到达暂停。 When a running async generator suspends on an await , that is observed by async for as a suspension of the implicit __anext__ invocation.当正在运行的异步生成器在await上挂起时, async for __anext__视为隐式__anext__调用的__anext__ consume() does exactly that when it waits for the values provided by produce and, as they become available, transmits them to aggregate coroutines like asum and amax .当它等待produce提供的值时, consume()正是这样做的,并且当它们可用时,将它们传输到聚合协程,如asumamax Waiting is realized using the next_val future, which carries the next element from it .等待是使用next_val未来实现的,它携带来自it的下一个元素。 Awaiting that future inside consume() suspends the async generator, and with it the aggregate coroutine.consume()等待未来consume()暂停异步生成器,以及聚合协程。

The advantage of this approach compared to greenlets' explicit switching is that it makes it much easier to combine coroutines that don't know of each other into the same event loop.与 greenlets 的显式切换相比,这种方法的优势在于它可以更容易地将彼此不认识的协程组合到同一个事件循环中。 For example, one could have two instances of max_and_sum running in parallel (in the same thread), or run a more complex aggregate function that invoked further async code to do calculations.例如,可以让max_and_sum两个实例并行运行(在同一线程中),或者运行一个更复杂的聚合函数,该函数调用进一步的异步代码来进行计算。

The following convenience function shows how to run the above from non-asyncio code:以下便利函数显示了如何从非 asyncio 代码运行上述内容:

def max_and_sum_asyncio_sync(it):
    # trivially instantiate the coroutine and execute it in the
    # default event loop
    coro = max_and_sum_asyncio(it)
    return asyncio.get_event_loop().run_until_complete(coro)

Performance表现

Measuring and comparing performance of these approaches to parallel execution can be misleading because sum and max do almost no processing, which over-stresses the overhead of parallelization.测量和比较这些并行执行方法的性能可能会产生误导,因为summax几乎不做任何处理,这会过度强调并行化的开销。 Treat these as you would treat any microbenchmarks, with a large grain of salt.像对待任何微基准一样对待这些,用大量的盐。 Having said that, let's look at the numbers anyway!话虽如此,让我们看看数字吧!

Measurements were produced using Python 3.6 The functions were run only once and given range(10000) , their time measured by subtracting time.time() before and after the execution.测量是使用 Python 3.6 生成的。这些函数只运行一次并给定range(10000) ,它们的时间通过在执行前后减去time.time()来测量。 Here are the results:结果如下:

  • max_and_sum_buffer and max_and_sum_tee : 0.66 ms - almost exact same time for both, with the tee version being a bit faster. max_and_sum_buffermax_and_sum_tee :0.66 ms - 两者几乎完全相同, tee版本要快一些。

  • max_and_sum_threads_simple : 2.7 ms. max_and_sum_threads_simple :2.7 毫秒。 This timing means very little because of non-deterministic buffering, so this might be measuring the time to start two threads and the synchronization internally performed by Python.由于非确定性缓冲,这个时间意义很小,所以这可能是测量启动两个线程的时间和 Python 内部执行的同步。

  • max_and_sum_threads : 1.29 seconds , by far the slowest option, ~2000 times slower than the fastest one. max_and_sum_threads : 1.29,迄今为止最慢的选项,比最快的慢约 2000 倍。 This horrible result is likely caused by a combination of the multiple synchronizations performed at each step of the iteration and their interaction with the GIL.这个可怕的结果很可能是由于在迭代的每个步骤中执行的多个同步及其与 GIL 的交互的组合造成的。

  • max_and_sum_greenlet : 25.5 ms, slow compared to the initial version, but much faster than the threaded version. max_and_sum_greenlet :25.5 毫秒,比初始版本慢,但比线程版本快得多。 With a sufficiently complex aggregate function, one can imagine using this version in production.有了足够复杂的聚合函数,可以想象在生产中使用这个版本。

  • max_and_sum_asyncio : 351 ms, almost 14 times slower than the greenlet version. max_and_sum_asyncio :351 毫秒,几乎比 greenlet 版本慢 14 倍。 This is a disappointing result because asyncio coroutines are more lightweight than greenlets, and switching between them should be much faster than switching between fibers.这是一个令人失望的结果,因为 asyncio 协程比 greenlet 更轻量级,并且它们之间的切换应该比纤程之间的切换快得多 It is likely that the overhead of running the coroutine scheduler and the event loop (which in this case is overkill given that the code does no IO) is destroying the performance on this micro-benchmark.很可能运行协程调度程序和事件循环的开销(在这种情况下,鉴于代码没有 IO,这是过大的)正在破坏这个微基准测试的性能。

  • max_and_sum_asyncio using uvloop : 125 ms. max_and_sum_asyncio使用uvloop :125 毫秒。 This is more than twice the speed of regular asyncio, but still almost 5x slower than greenlet.这是常规 asyncio 速度的两倍多,但仍比 greenlet 慢近 5 倍。

Running the examples under PyPy doesn't bring significant speedup, in fact most of the examples run slightly slower, even after running them several times to ensure JIT warmup.PyPy下运行示例并没有带来显着的加速,实际上大多数示例运行速度稍慢,即使在运行几次以确保 JIT 预热之后也是如此。 The asyncio function requires a rewrite not to use async generators (since PyPy as of this writing implements Python 3.5), and executes in somewhat under 100ms. asyncio 函数需要重写以不使用异步生成器(因为在撰写本文时 PyPy 实现了 Python 3.5),并且执行时间略低于 100 毫秒。 This is comparable to CPython+uvloop performance, ie better, but not dramatic compared to greenlet.这与 CPython+uvloop 的性能相当,即更好,但与 greenlet 相比并不显着。

If it holds for your aggregate functions that f(a,b,c,...) == f(a, f(b, f(c, ...))) ,then you could just cycle through your functions and feed them one element at a time, each time combining them with the result of the previous application, like reduce would do, eg like this:如果它适用于您的聚合函数f(a,b,c,...) == f(a, f(b, f(c, ...))) ,那么您可以循环遍历您的函数并一次给它们一个元素,每次都将它们与前一个应用程序的结果结合起来,就像reduce一样,例如这样:

def aggregate(iterator, *functions):
    first = next(iterator)
    result = [first] * len(functions)
    for item in iterator:
        for i, f in enumerate(functions):
            result[i] = f((result[i], item))
    return result

This is considerably slower (about 10-20 times) than just materializing the iterator in a list and applying the aggregate function on the list as a whole, or using itertools.tee (which basically does the same thing, internally), but it has the benefit of using no additional memory.这比仅仅在列表中实现迭代器并将聚合函数应用于整个列表或使用itertools.tee (基本上在内部做同样的事情)要慢得多(大约 10-20 倍),但它有不使用额外内存的好处。

Note, however, that while this works well for functions like sum , min or max , it does not work for other aggregating functions, eg finding the mean or median element of an iterator, as mean(a, b, c) != mean(a, mean(b, c)) .但是请注意,虽然这对summinmax等函数很有效,但它不适用于其他聚合函数,例如查找迭代器的均值或中值元素,如mean(a, b, c) != mean(a, mean(b, c)) (For mean , you could of course just get the sum and divide it by the number of elements, but computing eg the median taking just one element at a time will be more challenging.) (对于mean ,您当然可以得到sum并将其除以元素数,但计算例如一次只取一个元素的中位数将更具挑战性。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何同时执行 Python 中的函数并从中获取值? - How to execute functions in Python at the same time and get a values from them? 使用 asyncio 同时执行两个函数 - Execute two functions concurrently using asyncio 如何对两个迭代器“列表”中的值求和? - How to sum values from two iterator “lists”? 有两个函数在同一个 python 程序中同时工作吗? - Have two functions work concurrently in the same python program? 从两个不同的节点获取两个输出,并将它们作为列表作为单个输入提供给另一个节点 - Taking two outputs from two different nodes and feeding them as a list to another node as one single input 在 python 中同时运行两个函数 - Run two functions concurrently in python 如何使用不同的聚合函数有效地聚合同一列? - How to efficiently aggregate the same column using different aggregate functions? 这两个功能如何相同? - How are these two functions the same? 如何合并来自同一模型的两个查询集而不对它们排序 - How merge two querysets from same model without ordering them 融合两个类的特征并用融合后的特征喂养它们 - Fusing the features of two classes and feeding them with the fused features
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM