用于检查 AlertManager 的错误 Prometheus 端点

Question

I installed Prometheus (follow in this link: https://devopscube.com/setup-prometheus-monitoring-on-kubernetes/ )我安装了 Prometheus（请访问此链接： https : //devopscube.com/setup-prometheus-monitoring-on-kubernetes/ ）

But, when checking status of Targets, it shows "Down" for AlertManager service, every another endpoint are up, please see the attached file但是，在检查 Targets 的状态时，它显示 AlertManager 服务为“Down”，每隔一个端点都已启动，请参阅附件

Then, I check Service Discovery, the discovered labels shows:然后，我检查服务发现，发现的标签显示：

" address ="192.168.180.254:9093" __meta_kubernetes_endpoint_address_target_kind="Pod" __meta_kubernetes_endpoint_address_target_name="alertmanager-6c666985cc-54rjm" __meta_kubernetes_endpoint_node_name="worker-node1" __meta_kubernetes_endpoint_port_protocol="TCP" __meta_kubernetes_endpoint_ready="true" __meta_kubernetes_endpoints_name="alertmanager" __meta_kubernetes_namespace="monitoring" __meta_kubernetes_pod_annotation_cni_projectcalico_org_podIP="192.168.180.254/32" __meta_kubernetes_pod_annotationpresent_cni_projectcalico_org_podIP="true" __meta_kubernetes_pod_container_name="alertmanager" __meta_kubernetes_pod_container_port_name="alertmanager" __meta_kubernetes_pod_container_port_number="9093"" “地址=” 192.168.180.254:9093" __meta_kubernetes_endpoint_address_target_kind = “荚” __meta_kubernetes_endpoint_address_target_name = “alertmanager-6c666985cc-54rjm” __meta_kubernetes_endpoint_node_name = “工人节点1” __meta_kubernetes_endpoint_port_protocol = “TCP” __meta_kubernetes_endpoint_ready = “真” __meta_kubernetes_endpoints_name = “alertmanager” __meta_kubernetes_namespace =“监控“__meta_kubernetes_pod_annotation_cni_projectcalico_org_podIP =” 192.168.180.254/32" __meta_kubernetes_pod_annotationpresent_cni_projectcalico_org_podIP = “真” __meta_kubernetes_pod_container_name = “alertmanager” __meta_kubernetes_pod_container_port_name = “alertmanager” __meta_kubernetes_pod_container_port_number = “9093”，”

But Target Labels show another port ( 8080 ), I don't know why:但是目标标签显示另一个端口（ 8080 ），我不知道为什么：

 instance="192.168.180.254:8080" job="kubernetes-service-endpoints" kubernetes_name="alertmanager" kubernetes_namespace="monitoring"

Answer 1

First, if you want to install prometheus and grafana without getting sick, you need to do it though helm.首先，如果你想安装 prometheus 和 grafana 而不会生病，你需要通过 helm 来完成。

First install helm首先安装头盔

And then进而

helm install installationWhatEverName stable/prometheus-operator

Answer 2

I've reproduced your issue on GCE.我已经在 GCE 上重现了您的问题。

If you are using version 1.16+ you have probably changed apiVersion as in tutorial you have Deployment in extensions/v1beta1 .如果您使用的版本1.16+你可能会改变apiVersion在教程中，您必须Deployment在extensions/v1beta1 。 Since K8s 1.16+ you need to change it to apiVersion: apps/v1 .从 K8s 1.16+您需要将其更改为apiVersion: apps/v1 。 Otherwise you will get error like:否则你会得到如下错误：

error: unable to recognize "STDIN": no matches for kind "Deployment" in version "extensions/v1beta1"

Second thing, in 1.16+ you need to specify selector .第二件事，在 1.16+ 中，您需要指定selector 。 If you will not do it you will receive another error:如果您不这样做，您将收到另一个错误：

`error: error validating "STDIN": error validating data: ValidationError(Deployment.spec): missing required field "selector" in io.k8s.api.apps.v1.DeploymentSpec; if you choose to ignore these errors, turn validation off with --validate=false`

It would look like:它看起来像：

...
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-server
  template:
    metadata:
      labels:
        app: prometheus-server
    spec:
      containers:
...

Regarding port 8080 please check this article with example.关于8080端口，请以示例查看这篇文章。

Port: Port is the port number which makes a service visible to other services running within the same K8s cluster.端口：端口是使同一 K8s 集群中运行的其他服务可以看到服务的端口号。 In other words, in case a service wants to invoke another service running within the same Kubernetes cluster, it will be able to do so using port specified against “port” in the service spec file.换句话说，如果一个服务想要调用在同一个 Kubernetes 集群中运行的另一个服务，它将能够使用服务规范文件中针对“端口”指定的端口来实现。

It worked for my environment in GCE.它适用于我在 GCE 的环境。 Did you configure firewall for your endpoints?您是否为端点配置了防火墙？

In addition.此外。 In Helm 3 some hooks were deprecated.在Helm 3一些hooks被弃用了。 You can find this information here .您可以在此处找到此信息。

If you still have issue please provide your YAMLs witch applied changes to version 1.16+.如果您仍有问题，请提供您对 1.16+ 版应用更改的 YAML。

用于检查 AlertManager 的错误 Prometheus 端点

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-12-11 14:49:41

解决方案2
1 2019-12-30 12:45:26

用于检查 AlertManager 的错误 Prometheus 端点

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-12-11 14:49:41

解决方案2 1 2019-12-30 12:45:26

解决方案1
1 已采纳 2019-12-11 14:49:41

解决方案2
1 2019-12-30 12:45:26