Kube.netes 上的 Elasticsearch 集群 - 節點不通信

Question

我有一個在 Kube.netes (GKE) 上運行的 Elasticsearch 集群 (6.3)，清單文件如下：

---
# Source: elasticsearch/templates/manifests.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: elasticsearch-configmap
  labels:
    app.kubernetes.io/name: "elasticsearch"
    app.kubernetes.io/component: elasticsearch-server
data:
  elasticsearch.yml: |
    cluster.name: "${CLUSTER_NAME}"
    node.name: "${NODE_NAME}"

    path.data: /usr/share/elasticsearch/data
    path.repo: ["${BACKUP_REPO_PATH}"]

    network.host: 0.0.0.0

    discovery.zen.minimum_master_nodes: 1
    discovery.zen.ping.unicast.hosts: ${DISCOVERY_SERVICE}
  log4j2.properties: |
    status = error

    appender.console.type = Console
    appender.console.name = console
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n

    rootLogger.level = info
    rootLogger.appenderRef.console.ref = console
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  labels: &ElasticsearchDeploymentLabels
    app.kubernetes.io/name: "elasticsearch"
    app.kubernetes.io/component: elasticsearch-server
spec:
  selector:
    matchLabels: *ElasticsearchDeploymentLabels
  serviceName: elasticsearch-svc
  replicas: 2
  updateStrategy:
    # The procedure for updating the Elasticsearch cluster is described at
    # https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html
    type: OnDelete
  template:
    metadata:
      labels: *ElasticsearchDeploymentLabels
    spec:
      terminationGracePeriodSeconds: 180
      initContainers:
        # This init container sets the appropriate limits for mmap counts on the hosting node.
        # https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
        - name: set-max-map-count
          image: marketplace.gcr.io/google/elasticsearch/ubuntu16_04@...
          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
          command:
            - /bin/bash
            - -c
            - 'if [[ "$(sysctl vm.max_map_count --values)" -lt 262144 ]]; then sysctl -w vm.max_map_count=262144; fi'
      containers:
        - name: elasticsearch
          image: eu.gcr.io/projectId/elasticsearch6.3@sha256:...
          imagePullPolicy: Always
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: CLUSTER_NAME
              value: "elasticsearch-cluster"
            - name: DISCOVERY_SERVICE
              value: "elasticsearch-svc"
            - name: BACKUP_REPO_PATH
              value: ""
          ports:
            - name: prometheus
              containerPort: 9114
              protocol: TCP
            - name: http
              containerPort: 9200
            - name: tcp-transport
              containerPort: 9300
          volumeMounts:
            - name: configmap
              mountPath: /etc/elasticsearch/elasticsearch.yml
              subPath: elasticsearch.yml
            - name: configmap
              mountPath: /etc/elasticsearch/log4j2.properties
              subPath: log4j2.properties
            - name: elasticsearch-pvc
              mountPath: /usr/share/elasticsearch/data
          readinessProbe:
            httpGet:
              path: /_cluster/health?local=true
              port: 9200
            initialDelaySeconds: 5
          livenessProbe:
            exec:
              command:
                - /usr/bin/pgrep
                - -x
                - "java"
            initialDelaySeconds: 5
          resources:
            requests:
              memory: "2Gi"

        - name: prometheus-to-sd
          image: marketplace.gcr.io/google/elasticsearch/prometheus-to-sd@sha256:8e3679a6e059d1806daae335ab08b304fd1d8d35cdff457baded7306b5af9ba5
          ports:
            - name: profiler
              containerPort: 6060
          command:
            - /monitor
            - --stackdriver-prefix=custom.googleapis.com
            - --source=elasticsearch:http://localhost:9114/metrics
            - --pod-id=$(POD_NAME)
            - --namespace-id=$(POD_NAMESPACE)
            - --monitored-resource-types=k8s
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace

      volumes:
        - name: configmap
          configMap:
            name: "elasticsearch-configmap"
  volumeClaimTemplates:
    - metadata:
        name: elasticsearch-pvc
        labels:
          app.kubernetes.io/name: "elasticsearch"
          app.kubernetes.io/component: elasticsearch-server
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: standard
        resources:
          requests:
            storage: 50Gi
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-prometheus-svc
  labels:
    app.kubernetes.io/name: elasticsearch
    app.kubernetes.io/component: elasticsearch-server
spec:
  clusterIP: None
  ports:
    - name: prometheus-port
      port: 9114
      protocol: TCP
  selector:
    app.kubernetes.io/name: elasticsearch
    app.kubernetes.io/component: elasticsearch-server
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch-svc-internal
  labels:
    app.kubernetes.io/name: "elasticsearch"
    app.kubernetes.io/component: elasticsearch-server
spec:
  ports:
    - name: http
      port: 9200
    - name: tcp-transport
      port: 9300
  selector:
    app.kubernetes.io/name: "elasticsearch"
    app.kubernetes.io/component: elasticsearch-server
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
  name: ilb-service-elastic
  annotations:
    cloud.google.com/load-balancer-type: "Internal"
  labels:
    app: elasticsearch-svc
spec:
  type: LoadBalancer
  loadBalancerIP: some-ip-address
  selector:
    app.kubernetes.io/component: elasticsearch-server
    app.kubernetes.io/name: elasticsearch
  ports:
    - port: 9200
      protocol: TCP

此清單是根據過去在 GCP 市場上可用的模板編寫的。

我遇到了以下問題：集群應該有 2 個節點，實際上有 2 個 pod 正在運行。 然而

調用 ip:9200/_nodes 只返回一個節點
似乎仍然有第二個節點在運行以接收流量（至少是讀取流量），如日志中所示。 這些請求通常會失敗，因為請求的實體在該節點上不存在（僅在主節點上）。

我無法理解這樣一個事實，即該節點同時對主節點不可見，並且從指向有狀態集的負載平衡接收讀取流量。

我錯過了一些微妙的東西嗎？

Answer 1

您是否嘗試檢查這兩個節點的類型？

有主節點和數據節點，一次只有一個主節點被選出，而另一個主節點只留在后台，如果第一個主節點出現故障，新節點被選出並處理進一步的請求。

我看不到有狀態集中的節點類型配置。 我會建議檢查 Elasticsearch 的掌舵人以在 GKE 上設置和部署。

Helm圖表： https://github.com/elastic/helm-charts/tree/main/elasticsearch

共享示例 Env 配置以供參考：

env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: CLUSTER_NAME
          value: my-es
        - name: NODE_MASTER
          value: "false"
        - name: NODE_INGEST
          value: "false"
        - name: HTTP_ENABLE
          value: "false"
        - name: ES_JAVA_OPTS
          value: -Xms256m -Xmx256m

閱讀更多： https://faun.pub/https-medium-com-thakur-vaibhav23-ha-es-k8s-7e655c1b7b61

Kube.netes 上的 Elasticsearch 集群 - 節點不通信

問題描述

1 個解決方案

解決方案1
2 2022-03-22 12:58:54

Kube.netes 上的 Elasticsearch 集群 - 節點不通信

問題描述

1 個解決方案

解決方案1 2 2022-03-22 12:58:54

解決方案1
2 2022-03-22 12:58:54