调整芹菜以获得高性能

Question

I'm trying to send ~400 HTTP GET requests and collect the results. 我正在尝试发送~400个HTTP GET请求并收集结果。 I'm running from django. 我是从django跑来的。 My solution was to use celery with gevent. 我的解决方案是使用芹菜与gevent。

To start the celery tasks I call get_reports : 要启动芹菜任务，我调用get_reports ：

def get_reports(self, clients, *args, **kw):
    sub_tasks = []
    for client in clients:  
            s = self.get_report_task.s(self, client, *args, **kw).set(queue='io_bound')
        sub_tasks.append(s)
    res = celery.group(*sub_tasks)()
    reports = res.get(timeout=30, interval=0.001)
    return reports

@celery.task
def get_report_task(self, client, *args, **kw):
    report = send_http_request(...)
    return report

I use 4 workers: 我用4个工人：

manage celery worker -P gevent --concurrency=100 -n a0 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a1 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a2 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a3 -Q io_bound

And I use RabbitMq as the broker. 我使用RabbitMq作为经纪人。

And although it works much faster than running the requests sequentially (400 requests took ~23 seconds), I noticed that most of that time was overhead from celery itself, ie if I changed get_report_task like this: 虽然它的工作速度比按顺序运行请求要快得多（400个请求需要大约23秒），但我注意到大部分时间都是来自芹菜本身的开销，即如果我更改了get_report_task，就像这样：

@celery.task
def get_report_task(self, client, *args, **kw):
    return []

this whole operation took ~19 seconds. 整个操作耗时约19秒。 That means that I spend 19 seconds only on sending all the tasks to celery and getting the results back 这意味着我只花了19秒钟将所有任务发送到芹菜并将结果恢复

The queuing rate of messages to rabbit mq is seems to be bound to 28 messages / sec and I think that this is my bottleneck. 对Rabbit mq的消息的排队速率似乎是28个消息/秒，我认为这是我的瓶颈。

I'm running on a win 8 machine if that matters. 如果重要的话，我正在使用win 8机器。

some of the things I've tried: 我试过的一些事情：

using redis as broker 使用redis作为经纪人
using redis as results backend 使用redis作为结果后端
tweaking with those settings 调整这些设置
BROKER_POOL_LIMIT = 500 BROKER_POOL_LIMIT = 500
CELERYD_PREFETCH_MULTIPLIER = 0 CELERYD_PREFETCH_MULTIPLIER = 0
CELERYD_MAX_TASKS_PER_CHILD = 100 CELERYD_MAX_TASKS_PER_CHILD = 100
CELERY_ACKS_LATE = False CELERY_ACKS_LATE =假
CELERY_DISABLE_RATE_LIMITS = True CELERY_DISABLE_RATE_LIMITS = True

I'm looking for any suggestions that will help speed things up. 我正在寻找任何有助于加快速度的建议。

Answer 1

Are you really running on Windows 8 without a Virtual Machine? 你真的在没有虚拟机的Windows 8上运行吗？ I did the following simple test on 2 Core Macbook 8GB RAM running OS X 10.7: 我在运行OS X 10.7的2核心Macbook 8GB RAM上进行了以下简单测试：

import celery
from time import time

@celery.task
def test_task(i):
    return i

grp = celery.group(test_task.s(i) for i in range(400))
tic1 = time(); res = grp(); tac1 = time()
print 'queued in', tac1 - tic1
tic2 = time(); vals = res.get(); tac2 = time()
print 'executed in', tac2 - tic2

I'm using Redis as broker, Postgres as a result backend and default worker with --concurrency=4 . 我使用Redis作为经纪人，Postgres作为结果后端和默认工作者使用--concurrency=4 。 Guess what is the output? 猜猜输出是什么？ Here it is: 这里是：

queued in 3.5009469986 排队在3.5009469986

executed in 2.99818301201 执行于2.99818301201

Answer 2

Well it turnes out I had 2 separate issues. 好吧，它有两个不同的问题。

First off, the task was a member method. 首先，任务是一个成员方法。 After extracting it out of the class, the time went down to about 12 seconds. 将它从课堂中提取出来后，时间减少到大约12秒。 I can only assume it has something to do with the pickling of self . 我只能假设它与自我酸洗有关。

The second thing was the fact that it ran on windows. 第二件事是它在Windows上运行。 After running it on my linux machine, the run time was less than 2 seconds. 在我的linux机器上运行后，运行时间不到2秒。 Guess windows just isn't cut for high performance.. 猜猜窗户不是为了高性能而切割的。

Answer 3

How about using twisted instead? 怎么用扭曲呢？ You can reach for much simpler application structure. 您可以获得更简单的应用程序结构。 You can send all 400 requests from the django process at once and wait for all of them to finish. 您可以立即从django进程发送所有400个请求，并等待所有请求完成。 This works simultaneously because twisted sets the sockets into non-blocking mode and only reads the data when its available. 这同时工作，因为twisted将套接字设置为非阻塞模式，并仅在可用时读取数据。

I had a similar problem a while ago and I've developed a nice bridge between twisted and django. 我前一段时间遇到过类似的问题，我在twisted和django之间建立了一个很好的桥梁。 I'm running it in production environment for almost a year now. 我现在在生产环境中运行它差不多一年了。 You can find it here: https://github.com/kowalski/featdjango/ . 你可以在这里找到它： https ： //github.com/kowalski/featdjango/ 。 In simple words it has the main application thread running the main twisted reactor loop and the django view results is delegated to a thread. 简单来说，它有主应用程序线程运行主扭曲反应器循环，并且django视图结果被委托给一个线程。 It use a special threadpool, which exposes methods to interact with reactor and use its asynchronous capabilities. 它使用一个特殊的线程池，它公开了与reactor交互的方法并使用它的异步功能。

If you use it, your code would look like this: 如果您使用它，您的代码将如下所示：

from twisted.internet import defer
from twisted.web.client import getPage

import threading


def get_reports(self, urls, *args, **kw):
    ct = threading.current_thread()

    defers = list()
    for url in urls:
        # here the Deferred is created which will fire when
        # the call is complete
        d = ct.call_async(getPage, args=[url] + args, kwargs=kw)
        # here we keep it for reference
        defers.append(d)

    # here we create a Deferred which will fire when all the
    # consiting Deferreds are completed
    deferred_list = defer.DeferredList(defers, consumeErrors=True)
    # here we tell the current thread to wait until we are done
    results = ct.wait_for_defer(deferred_list)

    # the results is a list of the form (C{bool} success flag, result)
    # below unpack it
    reports = list()
    for success, result in results:
        if success:
            reports.append(result)
        else:
            # here handle the failure, or just ignore
            pass

    return reports

This still is something you can optimize a lot. 这仍然是你可以优化很多的东西。 Here, every call to getPage() would create a separate TCP connection and close it when its done. 在这里，每次调用getPage（）都会创建一个单独的TCP连接，并在完成后关闭它。 This is as optimal as it can be, providing that each of your 400 requests is sent to a different host. 这是最佳的，只要将400个请求中的每一个发送到不同的主机即可。 If this is not a case, you can use a http connection pool, which uses persistent connections and http pipelineing. 如果不是这种情况，则可以使用http连接池，该连接池使用持久连接和http管道。 You instantiate it like this: 你像这样实例化它：

from feat.web import httpclient

pool = httpclient.ConnectionPool(host, port, maximum_connections=3)

Than a single request is perform like this (this goes instead the getPage() call): 比这样执行一个请求（这取而代之的是getPage（）调用）：

d = ct.call_async(pool.request, args=(method, path, headers, body))

调整芹菜以获得高性能

问题描述

3 个解决方案

解决方案1
6 2013-09-18 10:26:57

解决方案2
2 已采纳 2013-09-24 20:44:01

解决方案3
0 2013-09-17 08:59:50

调整芹菜以获得高性能

问题描述

3 个解决方案

解决方案1 6 2013-09-18 10:26:57

解决方案2 2 已采纳 2013-09-24 20:44:01

解决方案3 0 2013-09-17 08:59:50

解决方案1
6 2013-09-18 10:26:57

解决方案2
2 已采纳 2013-09-24 20:44:01

解决方案3
0 2013-09-17 08:59:50