简体   繁体   English

在Python和Eventlet中使用多个核心

[英]Using multiple cores with Python and Eventlet

I have a Python web application in which the client ( Ember.js ) communicates with the server via WebSocket (I am using Flask-SocketIO ). 我有一个Python Web应用程序,客户端( Ember.js )通过WebSocket与服务器通信(我正在使用Flask-SocketIO )。 Apart from the WebSocket server the backend does two more things that are worth to be mentioned: 除了WebSocket服务器之外,后端还有两件值得提及的事情:

When the client submits an image its entity is created in the database and the id is put in an image conversion queue. 当客户端提交图像时,在数据库中创建其实体,并将id放入图像转换队列中。 The worker grabs it and does image conversion. 工人抓住它并进行图像转换。 After that the worker puts it in the OCR queue where it will be handled by the OCR queue worker. 之后,工作人员将其放入OCR队列,在那里它将由OCR队列工作者处理。

So far so good. 到现在为止还挺好。 The WS requests are handled synchronously in separate threads (Flask-SocketIO uses Eventlet for that) and the heavy computational action happens asynchronously (in separate threads as well). WS请求在不同的线程中同步处理(Flask-SocketIO使用Eventlet),并且繁重的计算操作异步发生(在单独的线程中也是如此)。

Now the problem: the whole application runs on a Raspberry Pi 3 . 现在问题是:整个应用程序在Raspberry Pi 3上运行。 If I do not make use of the 4 cores it has I only have one ARMv8 core clocked at 1.2 GHz . 如果我没有使用它的4核,我只有一个主频为1.2 GHz的ARMv8内核 This is very little power for OCR. 这对于OCR来说是非常小的力量。 So I decided to find out how to use multiple cores with Python. 所以我决定了解如何在Python中使用多个内核。 Although I read about the problems with the GIL ) I found out about multiprocessing where it says The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. 虽然我读到了GIL的问题但是我发现了多处理 ,它说The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. . Exactly what I wanted. 正是我想要的。 So I instantly replaced the 所以我立即取代了

from threading import Thread
thread = Thread(target=heavy_computational_worker_thread)
thread.start()

by 通过

from multiprocessing import Process
process = Process(target=heavy_computational_worker_thread)
process.start()

The queue needed to be handled by the multiple cores as well So i had to change 队列需要由多个核心处理,所以我不得不改变

from queue import Queue
queue = multiprocessing.Queue()

to

import multiprocessing
queue = multiprocessing.Queue()

as well. 同样。 Problematic: the queue and the Thread libraries are monkey patched by Eventlet. 有问题:队列和线程库由Eventlet进行猴子修补 If I stop using the monkey patched version of Thread and Queue and use the one from multiprocsssing instead then the request thread started by Eventlet blocks forever when accessing the queue. 如果我停止使用Monkey修补版本的Thread和Queue并使用multiprocsssing那个版本,那么在访问队列时,Eventlet启动的请求线程将永远阻塞。

Now my question: 现在我的问题:

Is there any way I can make this application do the OCR and image conversion on a separate core? 有什么办法可以让这个应用程序在单独的核心上进行OCR和图像转换吗?

I would like to keep using WebSocket and Eventlet if that's possible. 如果可能的话,我想继续使用WebSocket和Eventlet。 The advantage I have is that the only communication interface between the processes would be the queue. 我的优点是进程之间唯一的通信接口是队列。

Ideas that I already had: - Not using a Python implementation of a queue but rather using I/O. 我已经拥有的想法: - 不使用队列的Python实现,而是使用I / O. For example a dedicated Redis which the different subprocesses would access - Going a step further: starting every queue worker as a separate Python process (eg python3 wsserver | python3 ocrqueue | python3 imgconvqueue). 例如,不同子进程可以访问的专用Redis - 更进一步:将每个队列工作者作为单独的Python进程启动(例如python3 wsserver | python3 ocrqueue | python3 imgconvqueue)。 Then I would have to make sure myself that the access on the queue and on the database would be non-blocking 然后我必须确保自己对队列和数据库的访问是非阻塞的

The best thing would be to keep the single process and make it work with multiprocessing, though. 但最好的办法是保持单个进程并使其与多处理一起工作。

Thank you very much in advance 非常感谢你提前

Eventlet is currently incompatible with the multiprocessing package. Eventlet当前与多处理包不兼容。 There is an open issue for this work: https://github.com/eventlet/eventlet/issues/210 . 这项工作有一个未解决的问题: https//github.com/eventlet/eventlet/issues/210

The alternative that I think will work well in your case is to use Celery to manage your queue. 我认为在您的情况下运行良好的替代方法是使用Celery来管理您的队列。 Celery will start a pool of worker processes that wait for tasks provided by the main process via a message queue (RabbitMQ and Redis are both supported). Celery将启动一个工作进程池,等待主进程通过消息队列提供的任务(同时支持RabbitMQ和Redis)。

The Celery workers do not need to use eventlet, only the main server does, so this frees them to do whatever they need to do without the limitations imposed by eventlet. Celery工作者不需要使用eventlet,只需要使用主服务器,因此这可以让他们在没有eventlet强加的限制的情况下做任何他们需要做的事情。

If you are interested in exploring this approach, I have a complete example that uses it: https://github.com/miguelgrinberg/flack . 如果您有兴趣探索这种方法,我有一个完整的例子使用它: https//github.com/miguelgrinberg/flack

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM