简体   繁体   English

为普罗米修斯 django 导出器重新/使用 redis 连接的最佳方法

[英]Best way to re/use redis connections for prometheus django exporter

I am getting an error我收到一个错误

redis.exceptions.ConnectionError: Error 24 connecting to redis-service:6379. Too many open files.
...
OSError: [Errno 24] Too many open files

I know this can be fixed by increasing the ulimit but I don't think that's the issue here and also this is a service running on a container.我知道这可以通过增加ulimit来解决,但我认为这不是问题所在,而且这是在容器上运行的服务。 The application starts up correctly works for 48 hours correctly and then I get the above error.应用程序启动正常工作 48 小时,然后我收到上述错误。 Which implies that the connections are growing over time exponentially.这意味着连接随着时间呈指数增长。

What my application is basically doing我的应用程序基本上在做什么

  • background_task (ran using celery) -> collects data from postgres and sets it on redis background_task(使用 celery 运行)-> 从 postgres 收集数据并将其设置在 redis
  • prometheus reaches the app at '/metrics' which is a django view -> collects data from redis and serves the data using django prometheus exporter prometheus 在“/metrics”处到达应用程序,这是一个 django 视图 -> 从 redis 收集数据并使用 django prometheus 导出器提供数据

The code looks something like this代码看起来像这样

views.py

from prometheus_client.core import GaugeMetricFamily, REGISTRY
from my_awesome_app.taskbroker.celery import app


class SomeMetricCollector:

    def get_sample_metrics(self):
        with app.connection_or_acquire() as conn:
            client = conn.channel().client
            result = client.get('some_metric_key')
            return {'some_metric_key': result}

    def collect(self):
        sample_metrics = self.get_sample_metrics()
        for key, value in sample_metrics.items():
            yield GaugeMetricFamily(key, 'This is a custom metric', value=value)


REGISTRY.register(SomeMetricCollector())

tasks.py

# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query


@app.task()
def app_metrics_sync_periodic():
    with app.connection_or_acquire() as conn:
        client = conn.channel().client
        client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
        return True

I don't think the background data collection in tasks.py is causing the Redis connections to grow exponentially but it's the Django view '/metrics' in views.py which is causing.我不认为tasks.py中的背景数据收集导致tasks.py连接呈指数增长,但它是views.py中的Django视图'/metrics'导致的。

Can you please tell me what I am doing wrong here?你能告诉我我在这里做错了什么吗? If there is a better way to read from Redis from a Django view.如果有更好的方法从 Django 视图中读取 Redis。 The Prometheus instance scrapes the Django application every 5s . Prometheus 实例每隔5s抓取一次 Django 应用程序。

This answer is according to my use case and research.这个答案是根据我的用例和研究得出的。

The issue here, according to me, is the fact that each request to /metrics initiates a new thread where the views.py creates new connections in the Celery broker's connection pool.据我说,这里的问题是,对/metrics的每个请求都会启动一个新线程, views.py在该线程中会在Celery代理的连接池中创建新连接。

This can be easily handled by letting Django manage its own Redis connection pool through cache backend and Celery manage its own Redis connection pool and not use each other's connection pools from their respective threads. This can be easily handled by letting Django manage its own Redis connection pool through cache backend and Celery manage its own Redis connection pool and not use each other's connection pools from their respective threads.

Django Side Django 侧面

config.py

# CACHES
# ------------------------------------------------------------------------------
# For more details on options for your cache backend please refer
# https://docs.djangoproject.com/en/3.1/ref/settings/#backend
CACHES = {
    "default": {
        "BACKEND": "django_redis.cache.RedisCache",
        "LOCATION": "redis://localhost:6379/0",
        "OPTIONS": {
            "CLIENT_CLASS": "django_redis.client.DefaultClient",
        },
    }
}

views.py

from prometheus_client.core import GaugeMetricFamily, REGISTRY
# *: Replacing celery app with Django cache backend
from django.core.cache import cache


class SomeMetricCollector:

    def get_sample_metrics(self):
        # *: This is how you will get the new client, which is still context managed.
        with cache.client.get_client() as client:
            result = client.get('some_metric_key')
            return {'some_metric_key': result}

    def collect(self):
        sample_metrics = self.get_sample_metrics()
        for key, value in sample_metrics.items():
            yield GaugeMetricFamily(key, 'This is a custom metric', value=value)


REGISTRY.register(SomeMetricCollector())

This will ensure that Django will maintain it's Redis connection pool and not cause new connections to be spun up unnecessarily.这将确保Django将保持其Redis连接池,并且不会导致不必要地启动新连接。

Celery Side Celery 侧面

tasks.py

# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query


@app.task()
def app_metrics_sync_periodic():
    with app.connection_or_acquire() as conn:
        # *: This will force celery to always look into the existing connection pool for connection.
        client = conn.default_channel.client
        client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
        return True

How do I monitor connections?如何监控连接?

  • There is a nice prometheus celery exporter which will help you monitor your celery task activity not sure how you can add connection pool and connection monitoring to it.有一个不错的prometheus celery 导出器,它将帮助您监控 celery 任务活动,不确定如何向其添加连接池和连接监控。
  • The easiest way to manually verify if the connections are growing every time /metrics is hit on the web app, is by:每次在 web 应用程序上点击/metrics时,手动验证连接是否在增长的最简单方法是:
     $ redis-cli 127.0.0.1:6379> CLIENT LIST...
  • The client list command will help you see if the number of connections are growing or not.客户端列表命令将帮助您查看连接数是否在增长。
  • I don't use queues sadly but I would recommend using queues.遗憾的是,我不使用队列,但我建议使用队列。 This is how my worker runs:这就是我的工人的运行方式:
     $ celery -A my_awesome_app.taskbroker worker --concurrency=20 -l ERROR -E

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM