concurrency of heavy tasks in tornado

Question

my code:

import tornado.tcpserver
import tornado.ioloop
import itertools
import socket
import time

class Talk():
    def __init__(self, id):
        self.id = id

    @tornado.gen.coroutine
    def on_connect(self):
        try:
            while "connection alive":
                self.said = yield self.stream.read_until(b"\n") 

                response = yield tornado.gen.Task(self.task)     ### LINE 1

                yield self.stream.write(response)                   ### LINE 2

        except tornado.iostream.StreamClosedError:
            print('error: socket closed')
        return

    @tornado.gen.coroutine
    def task(self):
        if self.id == 1:
           time.sleep(3)  # sometimes request is heavy blocking
        return b"response"

    @tornado.gen.coroutine
    def on_disconnect(self):
        yield []


class Server(tornado.tcpserver.TCPServer):

    def __init__(self, io_loop=None, ssl_options=None, max_buffer_size=None):

        tornado.tcpserver.TCPServer.__init__(self,
            io_loop=io_loop,
            ssl_options=ssl_options,
            max_buffer_size=max_buffer_size)

        self.talk_id_alloc = itertools.count(1)
        return


    @tornado.gen.coroutine
    def handle_stream(self, stream, address):
        talk_id = next(self.talk_id_alloc)
        talk = Talk(talk_id)

        stream.set_close_callback(talk.on_disconnect)
        stream.socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
        stream.socket.setsockopt(socket.IPPROTO_TCP, socket.SO_KEEPALIVE, 1)

        talk.stream = stream

        yield talk.on_connect()

        return

Server().listen(8888)
tornado.ioloop.IOLoop.instance().start()

problem:

I need a tornado as tcp server - it looks like a good choice for handling many requests with low computation.

however:

  99% of requests will last less than 0,05 sec, but
  1% of them can last even 3 sec (special cases).
  single response must be returned at once, not partially.

what is best aproach here? how to achieve a code where LINE #1 is never blocking more than 0.1 sec

 yield tornado.gen.with_timeout(
    datetime.timedelta(seconds=0.1), tornado.gen.Task(self.task))

doesnt work form me - do nothing

 tornado.ioloop.IOLoop.current().add_timeout(
      datetime.timedelta(seconds=0.1),
      lambda: result.set_exception(TimeoutError("Timeout")))

either nothing.

looking for better solutions:

task can detect if need high computation (API ...) - using timeout?, then run/fork to another thread or even process and send to tornado server execption - "receive" me later from results queue (consumer/producer) i dont want case where timeout kill heavy task without saving results, and task is reopened within special wrapper - so consumer/producer pattern should be for all tasks?
adding new ioloop when current is blocked - how detect blocking?

I dont see any solution in tornado.

task in line #1 could be simple (~99%) or complicated, which can require:

 I/O:
 - disk/DB access
 - ram/redis access
 network:
 - API call
 CPU:
 - algorithms, regex

(the worst task will do all of above). I know what kind of task it is (the weight) only when I start doing it, so appriopriate is use a task queue in separate threads. I dont want delay simple/quick tasks.

Answer 1

so if you manage to cancel the heavy tasks, I recommend cancelling them with a time-out and then spawning them off to another thread. Performance-wise this is not ideal (GIL) but you prevent tornado from blocking - which is your ultimate goal.

A nice write-up about how this can be done can be found here: http://lbolla.info/blog/2013/01/22/blocking-tornado .

If you want to go further you could use something like celery where you can offload to other processes transparently - though this much heavier.

concurrency of heavy tasks in tornado

Question

1 answers

solution1
0 ACCPTED 2014-07-28 07:43:23

concurrency of heavy tasks in tornado

Question

1 answers

solution1 0 ACCPTED 2014-07-28 07:43:23

solution1
0 ACCPTED 2014-07-28 07:43:23