简体   繁体   English

Python 多处理池线程安全吗?

[英]Are Python multiprocessing Pool thread safe?

I have Django project.我有 Django 项目。 If I make package variable that contains Pool() object, and will try to use that Pool from Django views (which run in parallel way), will be this way thread safe?如果我创建包含Pool()对象的包变量,并尝试从 Django 视图(以并行方式运行)使用该 Pool,这样线程安全吗? Are there any others ways to do it?还有其他方法可以做到吗?

from multiprocessing import Pool
general_executor_pool = Pool()

I found this question via Google as I'm asking the same question.我通过谷歌发现了这个问题,因为我在问同样的问题。 Anecdotally I can say NO it is not, because I recently debugged a piece of software that suffered from race conditions.有趣的是,我可以说不,事实并非如此,因为我最近调试了一个受到竞争条件影响的软件。 Here's how it went:这是怎么回事:

  1. A master process ran in a loop and spawned a multiprocessing pool in a new thread eery 3 minutes with a list of ~1000 accounts to be acted on一个主进程在一个循环中运行,每 3 分钟在一个新线程中产生一个多处理池,其中包含大约 1000 个要处理的帐户列表
  2. The thread called multiprocessing.Pool(max_proccesses=32), pool.map(func, accounts).线程名为 multiprocessing.Pool(max_proccesses=32), pool.map(func, accounts)。 This would open 32 processes, and one by one apply each account to an available process.这将打开 32 个进程,并将每个帐户一个一个地应用到一个可用进程。
  3. Unbeknownst to the original author, this process took far longer to complete than 3 minutes.原作者不知道的是,这个过程花费的时间远远超过 3 分钟。 So what happened the next time a thread was spawned to create a multiprocessing pool?那么下一次产生线程以创建多处理池时会发生什么? Did it spawn 32 new processes for a total of 64?它是否产生了 32 个新进程,总共 64 个? No, in practice it did not.不,实际上并没有。 Instead my results were scrambled and showed indication that multiple threads were acting on my data in a non-deterministic way.相反,我的结果被打乱了,并表明多个线程以不确定的方式处理我的数据。

I'd love to trace through the multiprocessing module to see if it is un-thread-safe by design, or get an answer from someone in the know.我很想跟踪多处理模块,看看它是否按设计是非线程安全的,或者从知道的人那里得到答案。 Anecdotally, at least, I have witnessed first hand that it is not.有趣的是,至少,我亲眼目睹了事实并非如此。

For the record, I had to check this and it seems that multiprocessing.pool.Pool is indeed thread-safe.为了记录,我不得不检查这个,似乎 multiprocessing.pool.Pool确实是线程安全的。 The following code does not trigger an AssertionError (tested with Python 3.6.9) :以下代码不会触发 AssertionError(使用 Python 3.6.9 测试):

import random
import time
import multiprocessing.pool
from threading import Thread

pool = multiprocessing.pool.Pool()

def return_value(value):
    time.sleep(random.random())
    return value
count = 100
def call_return_value():
    counter_start = random.randint(0, 100)
    result = list(range(counter_start, counter_start + count))
    pool_result = pool.imap_unordered(return_value, range(counter_start, counter_start + count), chunksize=1)
    pool_result = list(pool_result)
    assert set(pool_result) == set(result)
tl = [Thread(target=call_return_value) for _ in range(24)]
for t in tl:
    t.start()

Basically, this code starts a Process Pool, and lauches 24 threads calling the return_value function via this pool.基本上,此代码启动一个进程池,并启动 24 个线程通过此池调用return_value函数。 This functions returns the value after waiting for a random delay (between 0 and 1s).此函数在等待随机延迟(0 到 1 秒之间)后返回值。

Of course, pool_result is not ordered anymore, but it contains the correct set of elements, and this is true for all threads : values do not get mixed.当然, pool_result 不再排序,但它包含正确的元素集,这对所有线程都是正确的:值不会混合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM