調整芹菜以獲得高性能

Question

我正在嘗試發送~400個HTTP GET請求並收集結果。 我是從django跑來的。 我的解決方案是使用芹菜與gevent。

要啟動芹菜任務，我調用get_reports ：

def get_reports(self, clients, *args, **kw):
    sub_tasks = []
    for client in clients:  
            s = self.get_report_task.s(self, client, *args, **kw).set(queue='io_bound')
        sub_tasks.append(s)
    res = celery.group(*sub_tasks)()
    reports = res.get(timeout=30, interval=0.001)
    return reports

@celery.task
def get_report_task(self, client, *args, **kw):
    report = send_http_request(...)
    return report

我用4個工人：

manage celery worker -P gevent --concurrency=100 -n a0 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a1 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a2 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a3 -Q io_bound

我使用RabbitMq作為經紀人。

雖然它的工作速度比按順序運行請求要快得多（400個請求需要大約23秒），但我注意到大部分時間都是來自芹菜本身的開銷，即如果我更改了get_report_task，就像這樣：

@celery.task
def get_report_task(self, client, *args, **kw):
    return []

整個操作耗時約19秒。 這意味着我只花了19秒鍾將所有任務發送到芹菜並將結果恢復

對Rabbit mq的消息的排隊速率似乎是28個消息/秒，我認為這是我的瓶頸。

如果重要的話，我正在使用win 8機器。

我試過的一些事情：

使用redis作為經紀人
使用redis作為結果后端
調整這些設置
BROKER_POOL_LIMIT = 500
CELERYD_PREFETCH_MULTIPLIER = 0
CELERYD_MAX_TASKS_PER_CHILD = 100
CELERY_ACKS_LATE =假
CELERY_DISABLE_RATE_LIMITS = True

我正在尋找任何有助於加快速度的建議。

Answer 1

你真的在沒有虛擬機的Windows 8上運行嗎？ 我在運行OS X 10.7的2核心Macbook 8GB RAM上進行了以下簡單測試：

import celery
from time import time

@celery.task
def test_task(i):
    return i

grp = celery.group(test_task.s(i) for i in range(400))
tic1 = time(); res = grp(); tac1 = time()
print 'queued in', tac1 - tic1
tic2 = time(); vals = res.get(); tac2 = time()
print 'executed in', tac2 - tic2

我使用Redis作為經紀人，Postgres作為結果后端和默認工作者使用--concurrency=4 。 猜猜輸出是什么？ 這里是：

排隊在3.5009469986

執行於2.99818301201

Answer 2

好吧，它有兩個不同的問題。

首先，任務是一個成員方法。 將它從課堂中提取出來后，時間減少到大約12秒。 我只能假設它與自我酸洗有關。

第二件事是它在Windows上運行。 在我的linux機器上運行后，運行時間不到2秒。 猜猜窗戶不是為了高性能而切割的。

Answer 3

怎么用扭曲呢？ 您可以獲得更簡單的應用程序結構。 您可以立即從django進程發送所有400個請求，並等待所有請求完成。 這同時工作，因為twisted將套接字設置為非阻塞模式，並僅在可用時讀取數據。

我前一段時間遇到過類似的問題，我在twisted和django之間建立了一個很好的橋梁。 我現在在生產環境中運行它差不多一年了。 你可以在這里找到它： https ： //github.com/kowalski/featdjango/ 。 簡單來說，它有主應用程序線程運行主扭曲反應器循環，並且django視圖結果被委托給一個線程。 它使用一個特殊的線程池，它公開了與reactor交互的方法並使用它的異步功能。

如果您使用它，您的代碼將如下所示：

from twisted.internet import defer
from twisted.web.client import getPage

import threading


def get_reports(self, urls, *args, **kw):
    ct = threading.current_thread()

    defers = list()
    for url in urls:
        # here the Deferred is created which will fire when
        # the call is complete
        d = ct.call_async(getPage, args=[url] + args, kwargs=kw)
        # here we keep it for reference
        defers.append(d)

    # here we create a Deferred which will fire when all the
    # consiting Deferreds are completed
    deferred_list = defer.DeferredList(defers, consumeErrors=True)
    # here we tell the current thread to wait until we are done
    results = ct.wait_for_defer(deferred_list)

    # the results is a list of the form (C{bool} success flag, result)
    # below unpack it
    reports = list()
    for success, result in results:
        if success:
            reports.append(result)
        else:
            # here handle the failure, or just ignore
            pass

    return reports

這仍然是你可以優化很多的東西。 在這里，每次調用getPage（）都會創建一個單獨的TCP連接，並在完成后關閉它。 這是最佳的，只要將400個請求中的每一個發送到不同的主機即可。 如果不是這種情況，則可以使用http連接池，該連接池使用持久連接和http管道。 你像這樣實例化它：

from feat.web import httpclient

pool = httpclient.ConnectionPool(host, port, maximum_connections=3)

比這樣執行一個請求（這取而代之的是getPage（）調用）：

d = ct.call_async(pool.request, args=(method, path, headers, body))

調整芹菜以獲得高性能

問題描述

3 個解決方案

解決方案1
6 2013-09-18 10:26:57

解決方案2
2 已采納 2013-09-24 20:44:01

解決方案3
0 2013-09-17 08:59:50

調整芹菜以獲得高性能

問題描述

3 個解決方案

解決方案1 6 2013-09-18 10:26:57

解決方案2 2 已采納 2013-09-24 20:44:01

解決方案3 0 2013-09-17 08:59:50

解決方案1
6 2013-09-18 10:26:57

解決方案2
2 已采納 2013-09-24 20:44:01

解決方案3
0 2013-09-17 08:59:50