简体   繁体   English

Python中的非阻塞,非并行任务

[英]Non-blocking, non-concurrent tasks in Python

I am working on an implementation of a very small library in Python that has to be non-blocking. 我正在Python中实现一个非常小的库,该库必须是非阻塞的。

On some production code, at some point, a call to this library will be done and it needs to do its own work, in its most simple form it would be a callable that needs to pass some information to a service. 在某些生产代码上,将在某个时候完成对该库的调用,并且它需要完成自己的工作,以最简单的形式,它是可调用的,需要将一些信息传递给服务。

This "passing information to a service" is a non-intensive task, probably sending some data to an HTTP service or something similar. 这种“将信息传递给服务”是一项非繁琐的任务,可能会将一些数据发送到HTTP服务或类似的东西。 It also doesn't need to be concurrent or to share information, however it does need to terminate at some point, possibly with a timeout. 它还不需要并发或共享信息, 但是它确实需要在某个时刻终止,可能会超时。

I have used the threading module before and it seems the most appropriate thing to use, but the application where this library will be used is so big that I am worried of hitting the threading limit. 我以前使用过threading模块,这似乎是最合适的使用方法,但是将要使用该库的应用程序太大,以至于我担心达到线程限制。

On local testing I was able to hit that limit at around ~2500 threads spawned. 在本地测试中,我能够达到大约2500个线程的上限。

There is a good possibility (given the size of the application) that I can hit that limit easily. 根据应用程序的大小,很有可能我可以轻松达到该限制。 It also makes me weary of using a Queue given the memory implications of placing tasks at a high rate in it. 考虑到将任务高速放置在队列中的内存问题,这也使我厌倦了使用队列。

I have also looked at gevent but I couldn't see an example of being able to spawn something that would do some work and terminate without joining. 我也看过gevent但是我看不到一个能够生成某些可以完成某些工作并在不加入的情况下终止的示例。 The examples I went through where calling .join() on a spawned Greenlet or on an array of greenlets. 我遍历了在衍生的GreenletGreenlet数组上调用.join()的示例。

I don't need to know the result of the work being done! 我不需要知道工作的结果! It just needs to fire off and try to talk to the HTTP service and die with a sensible timeout if it didn't. 它只是需要启动并尝试与HTTP服务进行通信,否则如果没有合理的超时就会死掉。

Have I misinterpreted the guides/tutorials for gevent ? 我是否误解了gevent的指南/教程? Is there any other possibility to spawn a callable in fully non-blocking fashion that can't hit a ~2500 limit? 还有其他可能以完全非阻塞的方式生成无法达到约2500个限制的可调用对象吗?

This is a simple example in Threading that does work as I would expect: 这是线程化中的一个简单示例,确实可以正常工作:

from threading import Thread


class Synchronizer(Thread):


    def __init__(self, number):
        self.number = number
        Thread.__init__(self)

    def run(self):
        # Simulating some work
        import time
        time.sleep(5)
        print self.number

for i in range(4000): # totally doesn't get past 2,500
    sync = Synchronizer(i)
    sync.setDaemon(True)
    sync.start()
    print "spawned a thread, number %s" % i

And this is what I've tried with gevent, where it obviously blocks at the end to see what the workers did: 这就是我在gevent上尝试过的方法,它在最后显然会阻塞以查看工作人员的工作:

def task(pid):
    """
    Some non-deterministic task
    """
    gevent.sleep(1)
    print('Task', pid, 'done')


for i in range(100):
    gevent.spawn(task, i)

EDIT: My problem stemmed out from my lack of familiarity with gevent . 编辑:我的问题源于我对gevent的不熟悉。 While the Thread code was indeed spawning threads, it also prevented the script from terminating while it did some work. 尽管Thread代码确实是在产生线程,但它也阻止了脚本在执行某些工作时终止。

gevent doesn't really do that in the code above, unless you add a .join() . 除非您添加.join() ,否则gevent在上面的代码中并没有真正做到这一点。 All I had to do to see the gevent code do some work with the spawned greenlets was to make it a long running process. 我要查看gevent代码对生成的greenlets进行某些工作,要做的只是使其运行很长时间。 This definitely fixes my problem as the code that needs to spawn the greenlets is done within a framework that is a long running process in itself. 这肯定解决了我的问题,因为需要产生绿色代码的代码是在一个框架内完成的,框架本身是一个长期运行的过程。

Nothing requires you to call join in gevent, if you're expecting your main thread to last longer than any of your workers. 如果您期望主线程的持续时间比任何工作线程的持续时间长,那么不需要调用gevent中的join

The only reason for the join call is to make sure the main thread lasts at least as long as all of the workers (so that the program doesn't terminate early). join调用的唯一原因是确保主线程的持续时间至少与所有工作线程的持续时间一样长(以使程序不会提前终止)。

为什么不生成带有连接管道或类似管道的子流程,然后将其放到管道上,然后让子流程完全带外处理,而不是调用它,而不是调用它。

As explained in Understanding Asynchronous/Multiprocessing in Python , asyncoro framework supports asynchronous, concurrent processes. 了解Python中的异步/多处理中所述asyncoro框架支持异步并发进程。 You can run tens or hundreds of thousands of concurrent processes; 您可以运行数以万计的并发进程; for reference, running 100,000 simple processes takes about 200MB. 供参考,运行100,000个简单进程大约需要200MB。 If you want to, you can mix threads in rest of the system and coroutines with asyncoro (provided threads and coroutines don't share variables, but use coroutine interface functions to send messages etc.). 如果需要,您可以将系统其余部分的线程和协程与asyncoro混合使用(提供的线程和协程不共享变量,而是使用协程接口函数发送消息等)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM