繁体   English   中英

如何使警报规则在 Prometheus 用户界面上可见?

[英]How to make alert rules visible on Prometheus User Interface?

我正在尝试在 Prometheus 中设置一些警报规则,以便在实例关闭时收到警报,但是当我单击 Prometheus UI 上的规则图标时,我看不到用于警报的设置配置规则。

我在我的电脑上本地测试这个,我有 docker prometheus、alertmanager、prom node_exporter 和其他一些应用

普罗米修斯目标 普罗米修斯用户界面警报 警报管理器用户界面

请帮忙...

prometheus.yml文件如下图PWD - /Users/spencer.ecas/ops/prometheus.yml

 global: scrape_interval: 15s scrape-timeout; 10s evaluation_interval: 15s external_labels: monitor: 'spencer' alerting: alertmanagers: - static_configs: - targets: - localhost:9093 rule_files: - alert.rules.yml scrape_configs: - job_name: 'prometheus' scrape_interval: 5s static_configs: - targets: ['localhost:9090'] labels: group: 'prometheus-server' - job_name: 'bis' scrape_interval: 5s metrics_path: /actor/prometheus static_configs: - targets: ['host.docker.internal:8790'] labels: group: 'prometheus-bi-sanbox' - job_name: "node" scrape_interval: 5s static_configs: - targets: ['host.docker.internal:9100'] labels: group: 'nodeexporter-server

alert.rules.yml PWD - /Users/spencer.ecas/ops/prometheus/alert.rules.yml

 groups: - name: alert.rules rules: - alert: InstanceDown expr: up == 0 for: 1m labels: severity: "critical" annotations: summary: "Endpoint {{ $labels.instance }} down" description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes." - alert: HostOutOfMemory expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 25 for: 5m labels: severity: warning annotations: summary: "Host out of memory (instance {{ $labels.instance }})" description: "Node memory is filling up (< 25% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" - alert: HostOutOfDiskSpace expr: (node_filesystem_avail{mountpoint="/"} * 100) / node_filesystem_size{mountpoint="/"} < 50 for: 1s labels: severity: warning annotations: summary: "Host out of disk space (instance {{ $labels.instance }})" description: "Disk is almost full (< 50% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}" - alert: HostHighCpuLoad expr: (sum by (instance) (irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80 for: 5m labels: severity: warning annotations: summary: "Host high CPU load (instance {{ $labels.instance }})" description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}"`

alertmanager.yml PWD - /Users/spencer.ecas/ops/alertmanager/alertmanager.yml

在这里,我试图将警报转发到我的松弛频道

 global: resolve_timeout: 5m route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'slack-notifications' receivers: - name: 'slack-notifications' slack_configs: - api_url: "https://hooks.slack.com/services/T06J2AUUR/B03CYRJPBPC/HcgsYeG1jjbduwb" channel: '#alertmanager' send_resolved: true`

一切似乎都已正确完成,但这里的问题可能是您如何启动 prometheus.yml 文件中的 prometheus 和 alert-manager 服务器。

其次,在您的 promtheus.yml 文件中,您确定配置文件正在从中读取警报规则吗?

rule_files:
 - alert.rules.yml

所以请编辑 prometheus.yml 文件并在 rule_files 下使用此路径代替

rule_files:
 - "/etc/prometheus/alert.rules.yml"

我建议您删除 alertmanager 和 prometheus 容器并使用下面的命令。 将 prometheus 容器与 alert.rules.yml 配置位置一起旋转的原因是 alert.rules 将持久存在于 prometheus 容器上,因为规则将在 prometheus 服务器上用于触发警报

确保在使用命令之前创建这样的目录 您应该在/Users/spencer.ecas/ops/prometheus中包含 prometheus.yml 文件

docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus

这只是上面命令的更好显示 - 将它们视为相同

docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM