Random Timeout Errors using heroku + gunicorn + aiohttp

Question

I've develop a service that acts as a gateway to redirect requests to different micro-services. To do this I've used aiohttp to handle an redirect requests, gunicorn (w/ aiohttp.worker.GunicornWebWorker) to serve and Heroku as host.

Working in local everything works perfect, 100% requests return a response and the client always receives the desired informations BUT when I deploy to Heroku and redirect some requests (5k per minute) I see between 3 to 7 requests with HTTP status 503 Timeout Error. It's nothing to worry about a lot because the proportion of good resolved requests it's very high (99.9994) but I want to know what is happening. The exception raising just before the Timeouts is like this:

[2017-02-10 17:03:48 +0000] [683] [INFO] Worker exiting (pid: 683) 
ERROR:asyncio:Task was destroyed but it is pending! 
[2017-02-10 17:03:48 +0000] [683] [INFO] Stopping server: 683, connections: 1 
Exception ignored in: <generator object GunicornWebWorker._run at 0x7f18b1d2f518> 
Traceback (most recent call last): 
  yield from self.close() 
  yield from self.wsgi.shutdown() 
File "/app/.heroku/python/lib/python3.5/site-packages/aiohttp/web.py", line 199, in shutdown 
  yield from self.on_shutdown.send(self) 
File "/app/.heroku/python/lib/python3.5/site-packages/aiohttp/signals.py", line 48, in send 
  yield from self._send(*args, **kwargs) 
File "/app/.heroku/python/lib/python3.5/site-packages/aiohttp/signals.py", line 16, in _send 
  yield from res 
File "/app/app/app.py", line 14, in close_redis 
  app.redis_pool.close() 
File "/app/.heroku/python/lib/python3.5/site-packages/aioredis/pool.py", line 135, in close 
  self._close_state.set() 
File "/app/.heroku/python/lib/python3.5/asyncio/locks.py", line 242, in set 
  fut.set_result(True) 
File "/app/.heroku/python/lib/python3.5/asyncio/futures.py", line 332, in set_result 
  self._schedule_callbacks() 
File "/app/.heroku/python/lib/python3.5/asyncio/futures.py", line 242, in _schedule_callbacks 
  self._loop.call_soon(callback, self) 
File "/app/.heroku/python/lib/python3.5/asyncio/base_events.py", line 497, in call_soon 
  handle = self._call_soon(callback, args) 
File "/app/.heroku/python/lib/python3.5/asyncio/base_events.py", line 506, in _call_soon 
  self._check_closed() 
File "/app/.heroku/python/lib/python3.5/asyncio/base_events.py", line 334, in _check_closed 
  raise RuntimeError('Event loop is closed') 
RuntimeError: Event loop is closed 
ERROR:asyncio:Task was destroyed but it is pending! 
task: <Task pending coro=<ServerHttpProtocol.start() running at /app/.heroku/python/lib/python3.5/site-packages/aiohttp/server.py:261>>
[2017-02-10 17:03:48 +0000] [4] [CRITICAL] WORKER TIMEOUT (pid:683)

Then heroku/router shows an error like this:

at=error code=H12 desc="Request timeout" method=GET path="/users/21324/posts/" host=superapp.herokuapp.com request_id=209bd839-baac-4e72-a04e-657d85348f45 fwd="84.78.56.97" dyno=web.2 connect=0ms service=30000ms status=503 bytes=0

I'm running the app with:

gunicorn --pythonpath app  app.app:aio_app --worker-class aiohttp.worker.GunicornWebWorker --workers 3

The main code is:

def init(asyncio_loop):
    app = web.Application(loop=asyncio_loop, middlewares=[middlewares.auth_middleware,
                                                          middlewares.logging_middleware])

    # INIT DBs
    app.redis_pool = asyncio_loop.run_until_complete(
        create_pool((settings.REDIS['host'], settings.REDIS['port']),
                    password=settings.REDIS['password']))

    # Clean connections on stop
    app.on_shutdown.append(close_redis)

    # Add rollbar
    rollbar.init(settings.ROLLBAR_TOKEN, 'production')  # access_token, environment

    # Bind routes
    for r in routes:
        app.router.add_route(r[0], r[1], r[2])

    return app


# Create app
loop = asyncio.get_event_loop()
aio_app = init(loop)

And a redirection example:

async with aiohttp.ClientSession() as s:
    try:
        async with s.request(method=method,
                             url=new_url,
                             headers=new_headers,
                             data=body,
                             allow_redirects=False,
                             timeout=25) as response:
            # Clean response
            resp_headers = MSRepository.filter_response_headers(response.headers)
            resp_body = (await response.read())

            return ResponseDataEntity(resp_headers, response.status, resp_body)
    except asyncio.TimeoutError:
        raise MSConnectionError("Request timeout")
    except Exception as e:
        rollbar.report_message(str(e), extra_data={
            "url": new_url,
            "data": body,
            "method": method,
            "headers": new_headers
        })
        raise MSConnectionError(str(e))

As you can see there is a timeout of 25s when making the requests and the exception is raising with 30s timeout.

Anyone there has any clue of what's happening?

(Note: When I write redirect I don't mean to say HTTP 302 I mean handle the request, edit headers, check auth, make async request to the appropiate MS, handle response and return this response)

Answer 1

At the end the problem was in one of the handlers. I don't really know what was happening because the timeouts were totally random for all of the endpoints but after 6h working perfectly with more than 10k requests per minute I'm sure that the problem was this. Here is the code before and after the fix:

async def bad_handler(request):
    # Read body in ALL cases to not to block requests
    if '/event-log/action/' == request.path:
        if request.query_string != '':
            action_type = request.query_string
        else:
            try:
                request_body = await request.json()
                action_type = request_body.get('type', '')
            except:
                action_type = ''

        print("Action_type={}".format(action_type))

    # Response to client
    return aiohttp.web.json_response(data={}, status=200)

async def good_handler(request):
    # Read body in ALL cases to not to block requests
    try:
        request_body = await request.json()
    except:
        request_body = None

    if '/event-log/action/' == request.path:
        if request.query_string != '':
            action_type = request.query_string
        else:
            if request_body is not None:
                action_type = request_body.get('type', '')
            else:
                action_type = ''

        print("Action_type={}".format(action_type))

    # Response to client
    return aiohttp.web.json_response(data={}, status=200)

As you can see the only difference is that in one case we are awaiting the body always and in the other case not.

I'll leave the question open just hopping someone answers me why it's working now. :)

Random Timeout Errors using heroku + gunicorn + aiohttp

Question

1 answers

solution1
0 2017-02-23 16:47:37

Random Timeout Errors using heroku + gunicorn + aiohttp

Question

1 answers

solution1 0 2017-02-23 16:47:37

solution1
0 2017-02-23 16:47:37