简体   繁体   English

worker 失败时更新全局变量(Python multiprocessing.pool ThreadPool)

[英]Update global variable when worker fails (Python multiprocessing.pool ThreadPool)

I have a Python function that requests data via API and involves a rotating expiring key.我有一个 Python function,它通过 API 请求数据并涉及一个轮换过期密钥。 The volume of requests necessitates some parallelization of the function. I am doing this with the multiprocessing.pool module ThreadPool.请求量需要 function 的一些并行化。我正在使用 multiprocessing.pool 模块 ThreadPool 执行此操作。 Example code:示例代码:

import requests
from multiprocessing.pool import ThreadPool
from tqdm import tqdm

# Input is a list-of-dicts results of a previous process.
results = [...]

# Process starts by retrieving an authorization key.
headers = {"authorization": get_new_authorization()}

# api_call() is called on each existing result with the retrieved key.
results = thread(api_call, [(headers, result) for result in results])

# Function calls API with passed headers for given URL and returns dict.
def api_call(headers_plus_result):
    headers, result = headers_plus_result
    r = requests.get(result["url"]), headers=headers)
    return json.loads(r.text)

# Threading function with default num_threads.
def thread(worker, jobs, num_threads=5):
    pool = ThreadPool(num_threads)
    results = list()
    for result in tqdm(pool.imap_unordered(worker, jobs), total=len(jobs)):
        if result:
            results.append(result)
    pool.close()
    pool.join()
    if results:
        return results

# Function to get new authorization key.
def get_new_authorization():
    ...
    return auth_key

I am trying to modify my mapping process so that, when the first worker fails (ie the authorization key expires), all other processes are paused until a new authorization key is retrieved.我试图修改我的映射过程,以便当第一个工作人员失败时(即授权密钥过期),所有其他进程都将暂停,直到检索到新的授权密钥。 Then, the processes proceed with the new key.然后,进程继续使用新密钥。

Should this be inserted into the actual thread() function?是否应该将其插入到实际的 thread() function 中? If I put an exception in the api_call function itself, I don't see how I can stop the pool manager or update the header being passed to other workers.如果我在 api_call function 本身中放置一个例外,我看不出如何停止池管理器或更新传递给其他工作人员的 header。

Additionally: is using ThreadPool even the best method if I want this kind of flexibility?另外:如果我想要这种灵活性,使用 ThreadPool 是否是最好的方法?

A simpler possibility might be to use a multiprocessing.Event and a shared variable.一种更简单的可能性可能是使用multiprocessing.Event和共享变量。 The Event would indicate whether the authentication was legit or not, and the shared variable would contain the authentication. Event 将指示身份验证是否合法,共享变量将包含身份验证。

event = mp.Event()
sharedAuthentication = mp.Array('u', 100) # 100 = max length

So a worker would run:所以一个工人会运行:

event.wait();
authentication = sharedAuthentication.value

Your main thread would initially set the authentication with您的主线程最初将设置身份验证

sharedAuthentication.value = ....
event.set()

and later modify the authentication with然后修改身份验证

event.clear()
... calculate new authentication
sharedAuthentication.value = .....
event.set()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM