简体   繁体   English

速率限制 API 请求在 Python 与多处理

[英]Rate Limiting API Requests in Python with Multiprocessing

I am using multiprocessing in Python to make parallel API requests.我在 Python 中使用multiprocessing来发出并行 API 请求。 I have 8 cores on my machine mp.cpu_count() == 8 .我的机器上有 8 个内核mp.cpu_count() == 8

I am limited to roughly 6 requests per second.我被限制在每秒大约 6 个请求。 What would be the optimal way to make my API calls and process them?拨打 API 电话并进行处理的最佳方式是什么?

Example code idea below but it doesn't work as intended.下面的示例代码想法但它没有按预期工作。 I get rapid-fire amount of 429s and then it backs off 10 seconds, but continues to get 429s again in rapid successsion.我得到 429 的快速射击量,然后后退 10 秒,但继续快速连续再次获得 429。 My fear is that my computer is sending all 8 cores so fast that it is overwhelming the service and not allowing for any successful calls to come back.我担心我的计算机发送所有 8 个内核的速度如此之快,以至于它会压倒服务并且不允许任何成功的调用返回。

import multiprocessing as mp
import time

def api_call(iter):

    query = {'api_key': iter[0], 'user_id': iter[1]}
    resp = requests.get(url, params=query)
    if resp.status_code == 200:

        data = resp.json()
        print(data )
        return data
    else:
        # Handle too many requests
        while resp.status_code == 429:
            time.sleep(10) # Back off 10 seconds.
            resp = requests.get(url, params=query)
        else:
            if resp.status_code == 200:
                data = resp.json()
                return data

   
if __name__ == "__main__":

    # Assume an iterable with api_key and other data to make request to API and populate query string
    iterable: list = [(api_key, other_data1), (api_key, other_data2)]

    with mp.Pool(mp.cpu_count()) as p:
        try:
            res: list = list(p.map(api_call, iterable))
        except KeyboardInterrupt:
            print("Terminating Multiprocess due to Keyboard Interrupt")
            p.terminate()
        else:
            p.close()
            p.join()

It sounds like you may already solved your problem, but one solution worth considering is the use of a Semaphore to limit the number of active processes.听起来您可能已经解决了您的问题,但值得考虑的一种解决方案是使用信号量来限制活动进程的数量。 This has the advantage that you can actually start as many tasks in parallel as you want, and then limit only the critical section that makes the web requests.这样做的好处是您实际上可以根据需要并行启动任意数量的任务,然后仅限制发出 web 请求的关键部分。

For example:例如:

import multiprocessing
import requests

mgr = multiprocessing.Manager()
sem = mgr.Semaphore(4)


def task(id):
    print(f"start task {id}")
    with sem:
        res = requests.get("http://google.com")
        date_from_header = res.headers["date"]
    print(f"stop task {id}")
    return date_from_header


with multiprocessing.Pool(processes=10) as pool:
    res = pool.map(task, range(1, 20))

print(res)

Regardless of the size of your pool, this will only ever have four concurrent calls to requests.get at any given time.无论您的池的大小如何,在任何给定时间,这只会对requests.get进行四次并发调用。 Once the requests is complete, your tasks can execute other code in parallel.请求完成后,您的任务可以并行执行其他代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM