简体   繁体   English

如何正确使用 Django 的多处理模块?

[英]How to properly use multiprocessing module with Django?

I'm having a python 3.8+ program using Django and Postgresql which requires multiple threads or processes.我有一个 python 3.8+ 程序使用 Django 和 Postgresql 需要多个线程或进程。 I cannot use threads since the GLI will restrict them to a single process which results in an awful performance (especially since most of the threads are CPU bound).我不能使用线程,因为 GLI 会将它们限制在单个进程中,这会导致糟糕的性能(特别是因为大多数线程都受 CPU 限制)。

So the obvious solution was to use the multiprocessing module.所以显而易见的解决方案是使用多处理模块。 But I've encountered several problems:但是我遇到了几个问题:

  1. When using spawn to generate new processes, I get the "Apps aren't loaded yet" error when the new process imports the Django models.使用spawn生成新进程时,当新进程导入 Django 模型时,我收到“尚未加载应用程序”错误。 This is because the new process doesn't have the database connection given to the main process by python manage.py runserver .这是因为新进程没有python manage.py runserver提供给主进程的数据库连接。 I circumvented it by using fork instead of spawn (like advised here ) so the connections are copied to the other processes but I feel like this is not the best solution and there should be a clean way to start new processes with the necessary connections.我通过使用fork而不是spawn来规避它(就像在这里建议的那样),因此连接被复制到其他进程,但我觉得这不是最好的解决方案,应该有一种干净的方法来启动具有必要连接的新进程。

  2. When several of the processes simultaneously access the database, sometimes false results are given back (partly even from wrong models / relations) which crashes the program.当几个进程同时访问数据库时,有时会返回错误的结果(部分甚至来自错误的模型/关系),这会使程序崩溃。 This can happen in the initial startup when fetching data but also when the program is running.这可能发生在获取数据时的初始启动中,也可能发生在程序运行时。 I tried to use ISOLATION LEVEL SERIALIZABLE (like advised here ) by adding it in the options in the database settings but that didn't work.我尝试通过将ISOLATION LEVEL SERIALIZABLE (如此建议)添加到数据库设置的选项中来使用它,但这不起作用。
    A possible solution might be using custom locks that are given to every process but that doesn't feel like a good solution as well.一个可能的解决方案可能是使用为每个进程提供的自定义锁,但这也不是一个好的解决方案。

So in general, the question is: Is there a good and clean way to use multiprocessing in Django without these issues?所以总的来说,问题是:在 Django 中使用多处理是否有一个好的和干净的方法没有这些问题? A way that new processes have the database connections without needing to rely on fork and that all processes can just access the database without having any race conditions sometimes producing false results like this?一种新进程无需依赖 fork 即可拥有数据库连接,并且所有进程都可以访问数据库而没有任何竞争条件有时会产生这样的错误结果的方式?

One important thing: I don't use a Pool since the processes aren't running the same simple task.一件重要的事情:我不使用池,因为进程没有运行相同的简单任务。 The processes are each running different specific tasks, share data via multiprocessing Signals, Queues, Values and Namespaces (shared memory) and new processes can be triggered by user interaction (websockets).每个进程都在运行不同的特定任务,通过多处理信号、队列、值和命名空间(共享内存)共享数据,并且可以通过用户交互(websockets)触发新进程。
I've tried to look into Celery since this has been recommended on a lot of questions about Django and multiprocessing but I wouldn't know how to use something like that in the project structure with the specific different processes that need to be created at specific points and the data that gets transferred over the Queues, Signals, Values and Namespaces in the existing project.我试图研究Celery ,因为在很多关于 Django 和多处理的问题上都建议使用此方法,但我不知道如何在项目结构中使用类似的东西,需要在特定的不同进程中创建点和通过现有项目中的队列、信号、值和命名空间传输的数据。

Thank you for reading;感谢您的阅读; any help is appreciated!任何帮助表示赞赏!

With every new process, a setup function calling Django.setup() is first called before executing the real function.对于每个新进程,在执行真正的 function 之前,首先调用调用 Django.setup() 的设置 function。 My hope was that with this way, every process would create an independent connection to the database so that the current system could work.我希望通过这种方式,每个进程都将创建一个与数据库的独立连接,以便当前系统可以工作。

Yes - you can do that with initializer , as explained in my other answer from yesteryear .是的 - 你可以用initializer来做到这一点,正如我在去年的另一个回答中所解释的那样。

However, it still throws errors like django.db.utils.OperationalError: lost synchronization with server: got message type "1", length 976434746但是,它仍然会引发错误,例如 django.db.utils.OperationalError: lost synchronization with server: got message type "1", length 976434746

That means you're using the fork start method for subprocesses, and any database connections and their state has been forked into the subprocesses too, and they will be out of sync when used by multiple processes.这意味着您正在对子进程使用fork启动方法,并且任何数据库连接及其 state 也已分叉到子进程中,并且在多个进程使用时它们将不同步。

You'll need to close them:您需要关闭它们:

def subprocess_setup():
    django.setup()
    from django.db import connections
    for conn in connections.all():
        conn.close()
    
with ProcessPoolExecutor(max_workers=5, initializer=subprocess_setup) as executor:
   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM