简体   繁体   English

连接断开时龙卷风内存泄漏

[英]Tornado memory leak on dropped connections

I've got a setup where Tornado is used as kind of a pass-through for workers. 我有一个设置,其中将龙卷风用作工人的通行证。 Request is received by Tornado, which sends this request to N workers, aggregates results and sends it back to client. Tornado接收到请求,该请求将请求发送给N个工作人员,汇总结果并将其发送回客户端。 Which works fine, except when for some reason timeout occurs — then I've got memory leak. 一切正常,除非出于某种原因发生超时-然后我出现了内存泄漏。

I've got a setup which similar to this pseudocode: 我有一个类似于以下伪代码的设置:

workers = ["http://worker1.example.com:1234/",
           "http://worker2.example.com:1234/", 
           "http://worker3.example.com:1234/" ...]

class MyHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def post(self):
        responses = []

        def __callback(response):
            responses.append(response)
            if len(responses) == len(workers):
                self._finish_req(responses)

        for url in workers:
            async_client = tornado.httpclient.AsyncHTTPClient()
            request = tornado.httpclient.HTTPRequest(url, method=self.request.method, body=body)
            async_client.fetch(request, __callback) 

    def _finish_req(self, responses):
        good_responses = [r for r in responses if not r.error]
        if not good_responses:
            raise tornado.web.HTTPError(500, "\n".join(str(r.error) for r in responses))
        results = aggregate_results(good_responses)
        self.set_header("Content-Type", "application/json")
        self.write(json.dumps(results))
        self.finish()

application = tornado.web.Application([
    (r"/", MyHandler),
])

if __name__ == "__main__":
    ##.. some locking code 
    application.listen()
    tornado.ioloop.IOLoop.instance().start()

What am I doing wrong? 我究竟做错了什么? Where does the memory leak come from? 内存泄漏来自哪里?

I don't know the source of the problem, and it seems gc should be able to take care of it, but there's two things you can try. 我不知道问题的根源,看来gc应该可以解决这个问题,但是您可以尝试两种方法。

The first method would be to simplify some of the references (it looks like there may still be references to responses when the RequestHandler completes): 第一种方法是简化一些引用(当RequestHandler完成时,似乎仍然有对responses引用):

class MyHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def post(self):
        self.responses = []

        for url in workers:
            async_client = tornado.httpclient.AsyncHTTPClient()
            request = tornado.httpclient.HTTPRequest(url, method=self.request.method, body=body)
            async_client.fetch(request, self._handle_worker_response) 

    def _handle_worker_response(self, response):
        self.responses.append(response)
        if len(self.responses) == len(workers):
            self._finish_req()

    def _finish_req(self):
        ....

If that doesn't work, you can always invoke garbage collection manually: 如果这不起作用,则始终可以手动调用垃圾回收:

import gc
class MyHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def post(self):
        ....

    def _finish_req(self):
        ....

    def on_connection_close(self):
        gc.collect()

The code looks good. 代码看起来不错。 The leak is probably inside Tornado. 泄漏可能在龙卷风内部。

I only stumbled over this line: 我只是偶然发现了这一行:

async_client = tornado.httpclient.AsyncHTTPClient()

Are you aware of the instantiation magic in this constructor? 您是否知道此构造函数中的实例化魔术? From the docs: 从文档:

"""
The constructor for this class is magic in several respects:  It actually
creates an instance of an implementation-specific subclass, and instances
are reused as a kind of pseudo-singleton (one per IOLoop).  The keyword
argument force_instance=True can be used to suppress this singleton
behavior.  Constructor arguments other than io_loop and force_instance
are deprecated.  The implementation subclass as well as arguments to
its constructor can be set with the static method configure()
"""

So actually, you don't need to do this inside the loop. 因此,实际上,您不需要在循环中执行此操作。 (On the other hand, it should not do any harm.) But which implementation are you using CurlAsyncHTTPClient or SimpleAsyncHTTPClient? (另一方面,它应该不会造成任何伤害。)但是,您使用CurlAsyncHTTPClient还是SimpleAsyncHTTPClient是哪个实现?

If it is SimpleAsyncHTTPClient, be aware of this comment in the code: 如果是SimpleAsyncHTTPClient,请注意以下代码中的注释:

"""
This class has not been tested extensively in production and
should be considered somewhat experimental as of the release of
tornado 1.2. 
"""

You can try switching to CurlAsyncHTTPClient. 您可以尝试切换到CurlAsyncHTTPClient。 Or follow Nikolay Fominyh's suggestion and trace the calls to __callback(). 或按照Nikolay Fominyh的建议,并跟踪对__callback()的调用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM