[英]How to achieve parallelism with tornado gen.Task / gen.coroutine decorators
Here is a case, when one must bring in parallelism into the backend server. 在这种情况下,必须将并行处理引入后端服务器。
I am willing to query N ELB's, each for 5 different queries, and send the result back to the web client. 我愿意查询N ELB,每个查询5个不同的查询,并将结果发送回Web客户端。
The backend is Tornado, and according to what I have read many times in the docs , in the past, I should be able to get several tasks processed in parallel if I use @gen.Task or gen.coroutine. 后端是Tornado,根据我过去在文档中多次阅读的内容,如果使用@ gen.Task或gen.coroutine,我应该能够并行处理多个任务。
However, I must be missing something in here, as all my requests are (20 in number, 4 elbs * 5 queries) are processed one after another. 但是,我必须在这里丢失一些内容,因为我的所有请求(数量为20,4个elb * 5个查询)都是一个接一个地处理的。
def query_elb(fn, region, elb_name, period, callback):
callback(fn (region, elb_name, period))
class DashboardELBHandler(RequestHandler):
@tornado.gen.coroutine
def get_elb_info(self, region, elb_name, period):
elbReq = yield gen.Task(query_elb, ELBSumRequest, region, elb_name, period)
elb2XX = yield gen.Task(query_elb, ELBBackend2XX, region, elb_name, period)
elb3XX = yield gen.Task(query_elb, ELBBackend3XX, region, elb_name, period)
elb4XX = yield gen.Task(query_elb, ELBBackend4XX, region, elb_name, period)
elb5XX = yield gen.Task(query_elb, ELBBackend5XX, region, elb_name, period)
raise tornado.gen.Return(
[
elbReq,
elb2XX,
elb3XX,
elb4XX,
elb5XX,
]
)
@tornado.web.authenticated
@tornado.web.asynchronous
@tornado.gen.coroutine
def post(self):
ret = []
period = self.get_argument("period", "5m")
cloud_deployment = db.foo.bar.baz()
for region, deployment in cloud_deployment.iteritems():
elb_name = deployment["elb"][0]
res = yield self.get_elb_info(region, elb_name, period)
ret.append(res)
self.push_json(ret)
def ELBQuery(region, elb_name, range_name, metric, statistic, unit):
dimensions = { u"LoadBalancerName": [elb_name] }
(start_stop , period) = calc_range(range_name)
cw = boto.ec2.cloudwatch.connect_to_region(region)
data_points = cw.get_metric_statistics( period, start, stop,
metric, "AWS/ELB", statistic, dimensions, unit)
return data_points
ELBSumRequest = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name, "RequestCount", "Sum", "Count")
ELBLatency = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name, "Latency", "Average", "Seconds")
ELBBackend2XX = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name, "HTTPCode_Backend_2XX", "Sum", "Count")
ELBBackend3XX = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name, "HTTPCode_Backend_3XX", "Sum", "Count")
ELBBackend4XX = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name, "HTTPCode_Backend_4XX", "Sum", "Count")
ELBBackend5XX = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name, "HTTPCode_Backend_5XX", "Sum", "Count")
The problem is that ELBQuery
is a blocking function. 问题在于
ELBQuery
是一个阻止函数。 If it doesn't yield
another coroutine somewhere, there is no way for the coroutine scheduler to interleave the calls. 如果在某个地方没有
yield
其他协程,则协程调度程序将无法交叉调用。 (That's the whole point of coroutines—they're cooperative, not preemptive.) (这是协程的全部要点-它们是合作的,而不是抢先的。)
If the problem is something like the calc_range
call, that would probably be easy to deal with—break it up into smaller pieces where each one yields to the next, which gives the scheduler a chance to get in between each piece. 如果问题是类似于
calc_range
调用的问题,则可能很容易处理-将其分解成较小的部分,每个部分calc_range
下一个,这使调度程序有机会进入每个部分。
But most likely, it's the boto calls that are blocking, and most of your function's time is spent waiting around for get_metric_statistics
to return, while nothing else can run. 但是最有可能的是,boto调用被阻塞了,并且函数的大部分时间都花在等待
get_metric_statistics
返回,而其他任何事情都无法运行。
So, how do you fix this? 那么,您如何解决呢?
asyncboto
project. asyncboto
项目。 greenlets
and monkeypatch enough of the library's dependencies to trick it into being async. greenlets
和Monkeypatch足够的库依赖项来欺骗它成为异步的。 This sounds hacky, but it may actually be the best solution; greenlets
and monkeypatch the whole stdlib ala gevent
to trick boto and tornado to work together without even realizing it. greenlets
并在整个stdlib ala gevent
以欺骗boto和龙卷风一起工作,甚至没有意识到。 This sounds like a terrible idea; gevent
. gevent
。 gevent
. gevent
类的单独进程(甚至是它们的池)。 Without knowing more details, I'd suggest looking at #2 and #4 first, but I can't promise they'll turn out to be the best answer for you. 在不了解更多详细信息的情况下,我建议您先查看#2和#4,但我不能保证它们会成为您的最佳选择。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.