[英]Kubernetes HPA with metrics from Prometheus Node-Exporter
[英]prometheus node-exporter on kubernetes
我已經在 kubernetes 集群(EKS)上部署了 prometheus。 我能夠通過以下方式成功抓取prometheus
和traefik
scrape_configs:
# A scrape configuration containing exactly one endpoint to scrape:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['prometheus.kube-monitoring.svc.cluster.local:9090']
- job_name: 'traefik'
static_configs:
- targets: ['traefik.kube-system.svc.cluster.local:8080']
但是使用以下定義部署為DaemonSet
節點導出器不會公開節點指標。
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-monitoring
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
name: node-exporter
labels:
app: node-exporter
spec:
hostNetwork: true
hostPID: true
containers:
- name: node-exporter
image: prom/node-exporter:v0.18.1
args:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
ports:
- containerPort: 9100
hostPort: 9100
name: scrape
resources:
requests:
memory: 30Mi
cpu: 100m
limits:
memory: 50Mi
cpu: 200m
volumeMounts:
- name: proc
readOnly: true
mountPath: /host/proc
- name: sys
readOnly: true
mountPath: /host/sys
tolerations:
- effect: NoSchedule
operator: Exists
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
並遵循 prometheus 中的 scrape_configs
scrape_configs:
- job_name: 'kubernetes-nodes'
scheme: http
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.kube-monitoring.svc.cluster.local:9100
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
我還嘗試從其中一個容器中curl http://localhost:9100/metrics
,但得到了curl: (7) Failed to connect to localhost port 9100: Connection refused
我在這里缺少什么配置?
在建議通過 helm 安裝 Prometheus 后,我沒有將其安裝在測試集群上,並嘗試將我的原始配置與 helm 安裝的 Prometheus 進行比較。
以下 Pod 正在運行:
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-prometheus-oper-alertmanager-0 2/2 Running 0 4m33s
prometheus-grafana-66c7bcbf4b-mh42x 2/2 Running 0 4m38s
prometheus-kube-state-metrics-7fbb4697c-kcskq 1/1 Running 0 4m38s
prometheus-prometheus-node-exporter-6bf9f 1/1 Running 0 4m38s
prometheus-prometheus-node-exporter-gbrzr 1/1 Running 0 4m38s
prometheus-prometheus-node-exporter-j6l9h 1/1 Running 0 4m38s
prometheus-prometheus-oper-operator-648f9ddc47-rxszj 1/1 Running 0 4m38s
prometheus-prometheus-prometheus-oper-prometheus-0 3/3 Running 0 4m23s
我在/etc/prometheus/prometheus.yml
pod prometheus-prometheus-prometheus-oper-prometheus-0
中沒有找到節點導出器的任何配置
之前使用 Helm 的建議非常有效,我也建議這樣做。
關於您的問題:問題是您沒有直接抓取節點,而是為此使用了 node-exporter。 所以role: node
不正確,你應該使用role: endpoints
。 為此,您還需要為 DaemonSet 的所有 pod 創建服務。
這是我的環境中的工作示例(由 Helm 安裝):
- job_name: monitoring/kube-prometheus-exporter-node/0
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: exporter-node
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
你是如何部署 Prometheus 的? 每當我使用 helm-chart ( https://github.com/helm/charts/tree/master/stable/prometheus ) 時,節點導出器都已部署。 也許這是一個更簡單的解決方案。
我被困在類似的地方。 但是這里我的節點導出器不是 helm 部署的一部分,因為我們從 Tanzu kubernetes grid(k8s 集群)獲得了附加節點導出器。 所以我創建了服務監視器,現在我可以看到服務發現和計數應該是什么。 但在目標部分,它說的是 0/4 計數。 無法看到節點的指標,但是當我可以卷曲 localhost:9100/metrics 時,我可以看到數據。 有些地方我缺少邏輯。
我檢查了 helm 部署的節點導出器數據,它看起來一樣,但我在這里遺漏了什么?
請忽略縮進,因為在移動設備中復制粘貼時會遺漏它們。
- job_name: node-exporter
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels:
[__meta_kubernetes_service_label_app]
separator: ;
regex: exporter-node
replacement: $1
action: keep
- source_labels:
[__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.