繁体   English   中英

Traefik 指标适用于 Prometheus,但 Grafana 仪表板为空

[英]Traefik metrics working for Prometheus but Grafana dashboards are empty

我已经用稳定的HELM图表(chart version 8.2.4)配置了Trafeik(v1.7.15)Prometheus算子。

但是,我看不到Grafana仪表板中的任何指标数据,而且它们是空的。

我还可以通过curl命令查看POD IP:8080端口附带的指标。 请参阅以下指标摘录和一些重要的配置清单。

我还可以看到, trafeik服务显示器处于UP从国家Prometheus和相同的策略,我已经做了Mongo/Postgres/Rabbit MQ指标和那些grafana仪表盘与丰富的数据表现和做工精细。

如果有人可以指导我从grafana修复和显示Trafeik入口控制器指标的正确轨道,那么非常感谢? 也让我知道这是什么原因?

我正在使用以下Grafana仪表板,但没有显示数据。 几个仪表盘的ID - 44758214117416293

谢谢你

Trafeik配置:

部署 YAML 参数

    ports:
    - name: http
      containerPort: 80
    - name: admin
      containerPort: 8080
    - name: https
      containerPort: 443
    args:
    #- --api
    - --web
    - --web.metrics.prometheus
    - --kubernetes
    - --logLevel=INFO
    - --configfile=/config/traefik.toml
    volumeMounts:
    - mountPath: /config
      name: config
    - mountPath: /ssl
      name: ssl

配置映射 TOML 文件

  traefik.toml: |
    # traefik.toml
    logLevel = "INFO"
    defaultEntryPoints = ["http","https"]
    [entryPoints]
      [entryPoints.http]
      address = ":80"
      [entryPoints.http.redirect]
      entryPoint = "https"
      [entryPoints.https]
      address = ":443"
      [entryPoints.https.tls]
        [[entryPoints.https.tls.certificates]]
        CertFile = "/ssl/tls.crt"
        KeyFile = "/ssl/tls.key"
    [metrics]
      [metrics.prometheus]
        buckets = [0.1,0.3,1.2,5.0]

Prometheus 服务监控 YAML

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
    name: traefik-sm
    labels:
        release: my-prometheus
spec:
    selector:
      matchLabels:
        k8s-app: traefik-ingress-lb
    namespaceSelector:
      any: true
    endpoints:
    - port: admin-ui
      name: traefik-ingress-service
      targetPort: 8080
      path: /metrics
      interval: 10s
      honorLabels: true

带有 CURL 的 Trafeik 指标

ubuntu@k8s-node1:~$ curl http://10.96.1.141:8080/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.3978e-05
go_gc_duration_seconds{quantile="0.25"} 1.86e-05
go_gc_duration_seconds{quantile="0.5"} 2.3194e-05
go_gc_duration_seconds{quantile="0.75"} 5.2525e-05
go_gc_duration_seconds{quantile="1"} 0.090356709
go_gc_duration_seconds_sum 12.978064956
go_gc_duration_seconds_count 3774
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 64
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 8.322768e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 2.7448991752e+10
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.579943e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 2.5932029e+08
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0.00037814152889298634
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.4064e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 8.322768e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 5.3641216e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.261568e+07
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 54120
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 4.636672e+07
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6256896e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5858102844353108e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2.5937441e+08
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 3472
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 180000
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 245760
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.6043632e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 666961
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 851968
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 851968
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.2024312e+07
# HELP go_threads Number of OS threads created
# TYPE go_threads gauge
go_threads 11
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 553.04
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 11
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 6.9451776e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.58573313806e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.90099456e+08
# HELP traefik_backend_server_up Backend server is up, described by gauge value of 0 or 1.
# TYPE traefik_backend_server_up gauge
traefik_backend_server_up{backend="auth-jooqa.abc.com/",url="http://192.168.22.77:8180"}
# HELP traefik_config_last_reload_failure Last config reload failure
# TYPE traefik_config_last_reload_failure gauge
traefik_config_last_reload_failure 0
# HELP traefik_config_last_reload_success Last config reload success
# TYPE traefik_config_last_reload_success gauge
traefik_config_last_reload_success 1.585741581e+09
# HELP traefik_config_reloads_failure_total Config failure reloads
# TYPE traefik_config_reloads_failure_total counter
traefik_config_reloads_failure_total 0
# HELP traefik_config_reloads_total Config reloads
# TYPE traefik_config_reloads_total counter
traefik_config_reloads_total 4

traefik导出的指标traefik

如果您检查导出的指标,则会发现太少了:

$ curl -s http://10.96.1.141:8080/metrics | grep -P '^traefik_'

traefik_backend_server_up{backend="auth-jooqa.abc.com/",url="http://192.168.22.77:8180"}
traefik_config_last_reload_failure 0
traefik_config_last_reload_success 1.585741581e+09
traefik_config_reloads_failure_total 0
traefik_config_reloads_total 4

很难找到带有您的一组指标的现成的grafana仪表板

让我们grep expr中提及的仪表板标签( 4475821411741 ,[6293]( https://grafana.com/grafana/dashboards/6293

for dashboard_url in 'https://grafana.com/api/dashboards/4475/revisions/4/download' 'https://grafana.com/api/dashboards/6293/revisions/2/download' 'https://grafana.com/api/dashboards/8214/revisions/1/download' 'https://grafana.com/api/dashboards/11741/revisions/1/download' ; do
  echo "\t = Dashboard: $dashboard_url = "
  curl -s $dashboard_url | jq '.panels[].targets[0].expr' | grep -Po 'traefik_[a-z_]+' | sort |uniq
done
))

上面的命令返回适当仪表板的expr中使用的traefik_*指标列表:

         = Dashboard: https://grafana.com/api/dashboards/4475/revisions/4/download =
traefik_backend_request_duration_seconds_sum
traefik_backend_requests_total
traefik_backend_server_up
traefik_config_reloads_total
traefik_entrypoint_requests_total
         = Dashboard: https://grafana.com/api/dashboards/6293/revisions/2/download =
traefik_backend_open_connections
traefik_backend_request_duration_seconds_sum
traefik_backend_requests_total
traefik_entrypoint_open_connections
traefik_entrypoint_request_duration_seconds_sum
traefik_entrypoint_requests_total
         = Dashboard: https://grafana.com/api/dashboards/8214/revisions/1/download =
traefik_backend_request_duration_seconds_sum
traefik_backend_requests_total
traefik_entrypoint_request_duration_seconds_sum
traefik_entrypoint_requests_total
         = Dashboard: https://grafana.com/api/dashboards/11741/revisions/1/download =
traefik_entrypoint_open_connections
traefik_entrypoint_request_duration_seconds_sum
traefik_entrypoint_requests_total
traefik_service_open_connections
traefik_service_request_duration_seconds_count
traefik_service_request_duration_seconds_sum
traefik_service_requests_total

如您所见,仅使用了 5 个指标中的两个。

让我们尝试找到合适的仪表板

由于这 4 个仪表板不适合您的指标集,让我们尝试在GitHub找到合适的仪表板:

  • traefik_backend_server_up : 8 个代码结果
  • traefik_backend_server_uptraefik_config_reloads_total11 个代码结果
  • traefik_config_last_reload_failure OR traefik_config_last_reload_success OR traefik_config_reloads_failure_total1 个代码结果

建议

所以,id 建议:

  • 要么尝试更新traefik以公开更多实际指标集
  • 或创建您自己的仪表板,这很容易

PS grafana-dashboard-builder更容易创建Grafana仪表板

有一个开源工具可以更轻松地创建仪表板:

jakubplichta/grafana-dashboard-builder:使用 YAML 生成 Grafana 仪表板

目前它支持三种数据存储:

  • 石墨
  • 普罗米修斯
  • 数据库

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM