简体   繁体   English

Python中线程之间的信号传递

[英]Signaling between threads in Python

I am working on a realtime data grabber. 我正在研究实时数据采集器。 I have a while True loop, and inside it, I spawn threads that do relatively small tasks (I am querying a 3rd party API over HTTP, and to achieve fast speeds I am querying in parallel). 我有一阵子True循环,在其中,我产生了执行相对较小任务的线程(我通过HTTP查询第三方API,并实现并行查询的更快速度)。

Every thread takes care of updating a specific data series. 每个线程负责更新特定的数据系列。 This might take 2, 3 or even 5 seconds. 这可能需要2、3甚至5秒。 However, my while True loop might spawn threads faster than how long it takes for the thread to finish. 但是,我的while True循环生成线程的速度可能比完成线程所需的时间要快。 Hence, I need the spawned threads to wait for their previous threads to finish. 因此,我需要生成的线程来等待它们之前的线程完成。

In general, its unpredictable how long it takes for the threads to finish because the threads query an HTTP server... 通常,由于线程查询HTTP服务器,因此无法预测线程需要多长时间才能完成。

I was thinking of creating a named semaphore for every thread, and then if a thread spawned for a specific series finds a previous thread working on the same series, it will wait. 我正在考虑为每个线程创建一个命名信号量,然后如果为特定系列生成的线程找到了在同一系列上工作的先前线程,它将等待。

The only issue that I can see is a possible backlog of threads.. 我看到的唯一问题是可能存在线程积压。

What is the best solution here? 最好的解决方案是什么? Should I look into things like Celery? 我应该研究芹菜吗? I am currently using the threading module. 我当前正在使用线程模块。

Thanks! 谢谢!

NO! 没有! Please, for the love of your God or intelligent designer, don't do that! 请,为了您的上帝或聪明的设计师的爱,不要那样做! Don't continually create/spawn/whatever threads and try to micro-manage them. 不要持续创建/生成/生成任何线程并尝试对其进行微管理。 Threadpool - create some threads at startup and pass them a producer-consumer queue to wait on for class instances representing those HTTP tasks. 线程池-在启动时创建一些线程,并将它们传递给生产者-消费者队列,以等待代表这些HTTP任务的类实例。

You should use Queue.Queue . 您应该使用Queue.Queue Create a queue for each series, and a thread to listen on that queue. 为每个系列创建一个队列,并创建一个线程来监听该队列。 Each time you need to read a series, put a request in the queue. 每次您需要阅读系列文章时,都将请求放入队列。 The thread waits for items in the queue, and each one it receives, it reads the data. 线程等待队列中的项目,接收到的每个项目都会读取数据。

Another option you could use if you are just requerying the API every time one of your queries returns is an asyncronous framework like Twisted ( Their tutorial on Threading ). 如果每次查询返回时仅重新查询API,则可以使用的另一种选择是Twisted( 它们的线程教程 )这样的异步框架。 I'm a relative Twisted beginner so there may be better ways of twisting Twisted to your task than this - 我是Twisted的初学者,因此可能有比Twisted更好的方法来扭曲您的任务-

from twisted.internet import reactor, defer
def simple_task():
    status = query_your_api()
    return status

def repeating_call(status):
    print(status)
    d = threads.deferToThread(simple_task)
    d.addCallback(repeating_call)

data_series = [data1, data2, data3]
for data in data_series:
    repeating_call('starting everything up')

reactor.run()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM