简体   繁体   English

Prometheus:无法从连接的Kubernetes集群中导出指标

[英]Prometheus: cannot export metrics from connected Kubernetes cluster

The issue: I have a Prometheus outside of Kubernetes cluster. 问题:我在Kubernetes集群之外有一个Prometheus。 So, I want to export metrics from remote cluster. 因此,我想从远程群集中导出指标。

I took the config sample from Prometheus Github repo and modified this a little bit. 我从Prometheus Github存储库中获取了配置示例,并对此做了一些修改。 So, here is my jobs config. 所以,这是我的工作配置。

  - job_name: 'kubernetes-apiservers'

    scheme: http

    kubernetes_sd_configs:
    - role: endpoints
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
      action: keep
      regex: default;kubernetes;http

  - job_name: 'kubernetes-nodes'

    scheme: http

    kubernetes_sd_configs:
    - role: node
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)

  - job_name: 'kubernetes-service-endpoints'

    scheme: http

    kubernetes_sd_configs:
    - role: endpoints
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (http?)
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: (.+)(?::\d+);(\d+)
      replacement: $1:$2
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_name

  - job_name: 'kubernetes-services'

    scheme: http

    metrics_path: /probe
    params:
      module: [http_2xx]

    kubernetes_sd_configs:
    - role: service
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
      action: keep
      regex: true
    - source_labels: [__address__]
      target_label: __param_target
    - target_label: __address__
      replacement: blackbox
    - source_labels: [__param_target]
      target_label: instance
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_service_namespace]
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      target_label: kubernetes_name

  - job_name: 'kubernetes-pods'

    scheme: http

    kubernetes_sd_configs:
    - role: pod
      api_server: http://cluster-manager.dev.example.net:8080

    bearer_token_file: /opt/prometheus/prometheus/kube_tokens/dev
    tls_config:
      insecure_skip_verify: true

    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
      action: keep
      regex: true
    - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
      action: replace
      regex: (.+):(?:\d+);(\d+)
      replacement: ${1}:${2}
      target_label: __address__
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod_name

I don't use a TLS connection to API, so I want to disable it. 我不使用TLS连接到API,所以我想禁用它。

When I curl /metrics URL from Prometheus host - it prints them. 当我卷曲来自Prometheus主机的/metrics URL时,它会打印它们。

Finally I connected to the cluster, but...the jobs are not up and therefore Prometheus doesn't expose relabeled metrics. 最终,我连接到集群,但是...工作没有完成,因此Prometheus不会公开重新标记的指标。

What I see in Console. 我在控制台中看到的内容。

在此处输入图片说明

在此处输入图片说明

Targets state: 目标状态:

在此处输入图片说明

Also I checked the Prometheus debug. 我也检查了Prometheus调试。 It's thought the system gets any necessary information and requests are done successfully. 认为系统会获取任何必要的信息,并且请求已成功完成。

time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"UDP\", \"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-local\"}, model.LabelSet{\"__address__\":\"10.32.0.2:10053\", \"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10053\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp-local\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_name\":\"kube-dns\", \"__meta_kubernetes_pod_container_port_number\":\"10055\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10055\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns\", \"__meta_kubernetes_pod_container_port_protocol\":\"UDP\"}, model.LabelSet{\"__address__\":\"10.32.0.2:53\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq\", \"__meta_kubernetes_pod_container_port_number\":\"53\", \"__meta_kubernetes_pod_container_port_name\":\"dns-tcp\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_number\":\"10054\", \"__meta_kubernetes_pod_container_port_name\":\"metrics\", \"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:10054\", \"__meta_kubernetes_pod_container_name\":\"dnsmasq-metrics\"}, model.LabelSet{\"__meta_kubernetes_pod_container_port_protocol\":\"TCP\", \"__address__\":\"10.32.0.2:8080\", \"__meta_kubernetes_pod_container_name\":\"healthz\", \"__meta_kubernetes_pod_container_port_number\":\"8080\", \"__meta_kubernetes_pod_container_port_name\":\"\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"kube-system\\\",\\\"name\\\":\\\"kube-dns-2924299975\\\",\\\"uid\\\":\\\"fa808d95-d7d9-11e6-9ac9-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"89\\\"}}\\n\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_affinity\":\"{\\\"nodeAffinity\\\":{\\\"requiredDuringSchedulingIgnoredDuringExecution\\\":{\\\"nodeSelectorTerms\\\":[{\\\"matchExpressions\\\":[{\\\"key\\\":\\\"beta.kubernetes.io/arch\\\",\\\"operator\\\":\\\"In\\\",\\\"values\\\":[\\\"amd64\\\"]}]}]}}}\", \"__meta_kubernetes_pod_name\":\"kube-dns-2924299975-dksg5\", \"__meta_kubernetes_pod_ip\":\"10.32.0.2\", \"__meta_kubernetes_pod_label_k8s_app\":\"kube-dns\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"2924299975\", \"__meta_kubernetes_pod_label_tier\":\"node\", \"__meta_kubernetes_pod_annotation_scheduler_alpha_kubernetes_io_tolerations\":\"[{\\\"key\\\":\\\"dedicated\\\",\\\"value\\\":\\\"master\\\",\\\"effect\\\":\\\"NoSchedule\\\"}]\", \"__meta_kubernetes_namespace\":\"kube-system\", \"__meta_kubernetes_pod_node_name\":\"cluster-manager.dev.example.net\", \"__meta_kubernetes_pod_label_component\":\"kube-dns\", \"__meta_kubernetes_pod_label_kubernetes_io_cluster_service\":\"true\", \"__meta_kubernetes_pod_host_ip\":\"54.194.166.39\", \"__meta_kubernetes_pod_label_name\":\"kube-dns\"}, Source:\"pod/kube-system/kube-dns-2924299975-dksg5\"}" 
time="2017-01-25T06:58:04Z" level=debug msg="pod update" kubernetes_sd=pod source="pod.go:66" tg="&config.TargetGroup{Targets:[]model.LabelSet{model.LabelSet{\"__address__\":\"10.43.0.0\", \"__meta_kubernetes_pod_container_name\":\"bot\"}}, Labels:model.LabelSet{\"__meta_kubernetes_pod_host_ip\":\"172.17.101.25\", \"__meta_kubernetes_pod_label_app\":\"bot\", \"__meta_kubernetes_namespace\":\"default\", \"__meta_kubernetes_pod_name\":\"bot-272181271-pnzsz\", \"__meta_kubernetes_pod_ip\":\"10.43.0.0\", \"__meta_kubernetes_pod_node_name\":\"ip-172-17-101-25\", \"__meta_kubernetes_pod_annotation_kubernetes_io_created_by\":\"{\\\"kind\\\":\\\"SerializedReference\\\",\\\"apiVersion\\\":\\\"v1\\\",\\\"reference\\\":{\\\"kind\\\":\\\"ReplicaSet\\\",\\\"namespace\\\":\\\"default\\\",\\\"name\\\":\\\"bot-272181271\\\",\\\"uid\\\":\\\"c297b3c2-e15d-11e6-a28a-02dfdae1a1e9\\\",\\\"apiVersion\\\":\\\"extensions\\\",\\\"resourceVersion\\\":\\\"1465127\\\"}}\\n\", \"__meta_kubernetes_pod_ready\":\"true\", \"__meta_kubernetes_pod_label_pod_template_hash\":\"272181271\", \"__meta_kubernetes_pod_label_version\":\"v0.1\"}, Source:\"pod/default/bot-272181271-pnzsz\"}" 

Prometheus fetches updates, but...doesn't convert them to metrics. Prometheus会获取更新,但是...不会将其转换为指标。 So, I've broken my brain to figure out why is it going this way. 因此,我已经绞尽脑汁想出为什么会这样。 So, please, help if you can figure out where might be mistake. 因此,如果您能找出错误的地方,请提供帮助。

If you want to monitor a Kubernetes cluster from an external Prometheus server, I would suggest to set up a Prometheus federation topology: 如果要从外部Prometheus服务器监视Kubernetes集群,我建议设置Prometheus联合拓扑:

  • Inside the K8s, install node-exporter pods and a Prometheus instance with short-term storage. 在K8内部,安装具有短期存储功能的node-exporter pod和Prometheus实例。
  • Expose the Prometheus service out of the K8s cluster, either via an ingress-controller (LB), or a node port. 通过入口控制器(LB)或节点端口将Prometheus服务公开到K8s集群之外。 You can protect this endpoint with HTTPS + basic authentication. 您可以使用HTTPS +基本身份验证来保护此端点。
  • Configure the center Prometheus to scrape metrics from above endpoint with proper authentication and tags. 配置中心Prometheus,以使用适当的身份验证和标记从上述端点上抓取指标。

This is the scalable solution. 这是可扩展的解决方案。 You can add monitor as many K8s clusters you want, until it reaches the capacities of the center Prometheus. 您可以添加任意数量的K8集群监视器,直到达到中心Prometheus的容量为止。 Then you can add another center Prometheus instance to monitor others. 然后,您可以添加另一个中心Prometheus实例来监视其他实例。

Finally I came to the though it's not trivial to setup Kubernetes cluster monitoring outside of cluster. 终于我来到了,尽管在集群之外设置Kubernetes集群监控并非易事。 Cause Kubernetes architecture suggested to keep all infrastructure within one local network. 原因Kubernetes体系结构建议将所有基础结构保留在一个本地网络内。 So, every workaround is going to be messy. 因此,每个解决方法都将变得混乱。 Also I came to the problem trying to debug why all configured jobs about Kubernetes roles such as nods, pods, services and endpoints doensn't even show up in targets status page. 我也遇到了尝试调试的问题,为什么有关Kubernetes角色的所有已配置作业(例如点头,吊舱,服务和端点)甚至都不会显示在目标状态页面中。 I may think wrong, but I didn't find out how to debug this issue in Prometheus. 我可能认为错了,但是我没有找到如何在Prometheus中调试此问题的方法。

My solution to monitor Kubernetes cluster outside was a kube-api-exporter . 我监视外部Kubernetes集群的解决方案是kube-api-exporter Pretty simple Python script which gets all metrics about ds, deployments and pods and finally provides the URL to fetch them. 非常简单的Python脚本,它获取有关ds,部署和pod的所有指标,最后提供URL来获取它们。 So, I'd recommend to come to this solution everyone who's got stuck with this sort of integration. 因此,我建议所有人都喜欢这种集成的人来使用此解决方案。

Also I started to fetch metrics from etcd. 我也开始从etcd获取指标。 That's cool that etcd provides Prometheus-style metrics out of the box. etcd提供开箱即用的Prometheus风格指标很酷。

PS: thanks to FuzzyAmi for help. PS:感谢FuzzyAmi的帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM