简体   繁体   English

芹菜+事件= 100%CPU使用率

[英]celery + eventlet = 100% CPU usage

We are using celery to get flights data from different travel agencies, every request takes ~20-30 seconds(most agencies require request sequence - authorize, send request, poll for results). 我们正在使用芹菜从不同的旅行社获取航班数据,每个请求大约需要20到30秒的时间(大多数旅行社需要请求顺序-授权,发送请求,查询结果)。

Normal celery task looks like this: 正常的芹菜任务如下所示:

from eventlet.green import urllib2, time 
def get_results(attr, **kwargs): 
    search, provider, minprice = attr 
    data = XXX # prepared data 
    host = urljoin(MAIN_URL, "RPCService/Flights_SearchStart") 
    req = urllib2.Request(host, data, {'Content-Type': 'text/xml'}) 
    try: 
        response_stream = urllib2.urlopen(req) 
    except urllib2.URLError as e: 
        return [search, None] 
    response = response_stream.read() 
    rsp_host = urljoin(MAIN_URL, "RPCService/FlightSearchResults_Get") 
    rsp_req = urllib2.Request(rsp_host, response, {'Content-Type': 
'text/xml'}) 
    ready = False 
    sleeptime = 1 
    rsp_response = '' 
    while not ready: 
        time.sleep(sleeptime) 
        try: 
            rsp_response_stream = urllib2.urlopen(rsp_req) 
        except urllib2.URLError as e: 
            log.error('go2see: results fetch failed for %s IOError %s'% 
(search.id, str(e))) 
        else: 
            rsp_response = rsp_response_stream.read() 
            try: 
                rsp = parseString(rsp_response) 
            except ExpatError as e: 
                return [search, None] 
            else: 
                ready = rsp.getElementsByTagName('SearchResultEx') 
[0].getElementsByTagName('IsReady')[0].firstChild.data 
                ready = (ready == 'true') 
        sleeptime += 1 
        if sleeptime > 10: 
            return [search, None] 
    hash = "%032x" % random.getrandbits(128) 
    open(RESULT_TMP_FOLDER+hash, 'w+').write(rsp_response) 
   # call to parser 
    parse_agent_results.apply_async(queue='parsers', args=[__name__, 
search, provider, hash]) 

This tasks are run in eventlet pool with concurency 300, prefetch_multiplier = 1 , broker_limit = 300 When ~100-200 task are fetched from queue - CPU usage raises up to 100% ( whole CPU core is used) and task fetching from queue is performed with delays. 此任务在事件let池中以并发度300运行, prefetch_multiplier = 1broker_limit = 300当从队列中获取约100-200个任务时-CPU使用率提高到100%(使用整个CPU内核),并执行从队列中获取任务延迟。

Could you please point on possible issues - blocking operations( eventlet ALARM DETECTOR gives no exceptions ), wrong architecture or whatever. 您能否指出可能的问题-阻止操作(eventlet ALARM DETECTOR给出异常),错误的体系结构或其他原因。

Sorry for late response. 抱歉迟了回应。

Thing i would try first in such situation is to turn off Eventlet completely in both Celery and your code, use process or OS thread model. 在这种情况下,我首先尝试的是在Celery和您的代码中完全关闭Eventlet,使用进程或OS线程模型。 300 threads or even processes is not that much load for OS scheduler (although you may lack memory to run many processes). 对于OS调度程序而言,300个线程甚至进程的负载并不大(尽管您可能缺少内存来运行许多进程)。 So i would try it and see if CPU load drops dramatically. 所以我会尝试一下,看看CPU负载是否急剧下降。 If it does not, then problem is in your code and Eventlet can't magically fix it. 如果不是这样,则问题出在您的代码中,Eventlet无法神奇地修复它。 If it does drop, however, we would need to investigate the issue closer. 但是,如果确实下降了,我们将需要对此问题进行更深入的调查。

If bug still persists, please, report it via any of these ways: 如果错误仍然存​​在,请通过以下任何一种方式报告该错误:

A problem occurs if you fire 200 requests to a server, responses could be delayed and therefore urllib.urlopen will hang. 如果您向服务器发出200个请求,则会出现问题,响应可能会延迟,因此urllib.urlopen将挂起。

Another thing i noticed: If an URLError is raised, the program stays in the while loop until sleeptime is greater than 10. So an URLError error will let this script sleep for 55 sec (1+2+3.. etc) 我注意到的另一件事:如果引发URLError,程序将停留在while循环中,直到睡眠时间大于10。因此,URLError错误将使此脚本休眠55秒(1 + 2 + 3 ..等)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM