简体   繁体   English

多重处理池和速率限制

[英]multiprocessing.Pool and Rate limit

I'm making some API requests which are limited at 20 per second. 我正在发出一些API请求,这些请求限制为每秒20个。 As to get the answer the waiting time is about 0.5 secs I thought to use multiprocessing.Pool.map and using this decorator rate-limiting So my code looks like 为了得到答案,等待时间约为0.5秒,我以为使用multiprocessing.Pool.map并使用此装饰器限速,所以我的代码看起来像

def fun(vec):
        #do stuff

def RateLimited(maxPerSecond):
    minInterval = 1.0 / float(maxPerSecond)
    def decorate(func):
        lastTimeCalled = [0.0]
        def rateLimitedFunction(*args,**kargs):
            elapsed = time.clock() - lastTimeCalled[0]
            leftToWait = minInterval - elapsed
            if leftToWait>0:
                time.sleep(leftToWait)
            ret = func(*args,**kargs)
            lastTimeCalled[0] = time.clock()
            return ret
        return rateLimitedFunction
    return decorate

@RateLimited(20)
def multi(vec):
    p = Pool(5)   
    return p.map(f, vec)

I have 4 cores and this program works fine and there is an improvement in time compared to the loop version. 我有4个核心,该程序运行良好,与循环版本相比,时间有所改善。 Furthermore, when the Pool argument is 4,5,6 it works and the time is smaller for Pool(6) but when I use 7+ I got errors (Too many connections per second I guess). 此外,当Pool参数为4,5,6时,它可以工作,而Pool(6)的时间更短,但是当我使用7+时,会出现错误(我猜每秒连接太多)。

Then if my function is more complicated and can do 1-5 requests the decorator doesn't work as expected. 然后,如果我的函数更复杂并且可以执行1-5请求,则装饰器将无法正常工作。 What else I can use in this case? 在这种情况下,我还能使用什么?

UPDATE 更新

For anyone looking for use Pool remembers to close it otherwise you are going to use all the RAM 对于任何想要使用的人,Pool都记得要关闭它,否则您将使用所有RAM

def multi(vec):
    p = Pool(5)   
    res=p.map(f, vec)
    p.close()
    return res

UPDATE 2 更新2

I found that something like this WebRequestManager can do the trick. 我发现像这样的WebRequestManager可以解决问题。 The problem is that doesn't work with multiprocessing. 问题在于不适用于多处理。 Pool with 19-20 processes because the time is stored in the class you need to call when you run the request. 具有19-20个进程的池,因为时间存储在运行请求时需要调用的类中。

Your indents are inconsistent up above which makes it harder to answer this, but I'll take a stab. 上方的缩进不一致,因此很难回答,但我会采取措施。

It looks like you're rate limiting the wrong thing; 看来您正在限制错误的事情; if f is supposed be limited, you need to limit the calls to f , not the calls to multi . 如果应该限制f则需要将调用限制为f ,而不是对multi的调用。 Doing this in something that's getting dispatched to the Pool won't work, because the forked workers would each be limiting independently (forked processes will have independent tracking of the time since last call). 在派遣到Pool事务中执行此操作将不起作用,因为分叉的工作人员将各自独立地进行限制(分叉的进程将独立跟踪自上次调用以来的时间)。

The easiest way to do this would be to limit how quickly the iterator that the Pool pulls from produces results. 最简单的方法是限制Pool从中拉出的迭代器产生结果的速度。 For example: 例如:

import collections
import time
def rate_limited_iterator(iterable, limit_per_second):
    # Initially, we can run immediately limit times
    runats = collections.deque([time.time()] * limit_per_second)
    for x in iterable:
        runat, now = runats.popleft(), time.time()
        if now < runat:
            time.sleep(runat - now)
        runats.append(time.time() + 1)
        yield x

def multi(vec):
    p = Pool(5)
    return p.map(f, rate_limited_iterator(vec, 20))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM