在企業防火牆/代理服務器后面運行 kubernetes kubeadm 集群

Question

我們有一個 5 節點集群，它被移到我們的企業防火牆/代理服務器后面。

按照這里的說明： setting-up-standalone-kubernetes-cluster-behind-corporate-proxy

我使用以下方法設置代理服務器環境變量：

export http_proxy=http://proxy-host:proxy-port/
export HTTP_PROXY=$http_proxy
export https_proxy=$http_proxy
export HTTPS_PROXY=$http_proxy
printf -v lan '%s,' localip_of_machine
printf -v pool '%s,' 192.168.0.{1..253}
printf -v service '%s,' 10.96.0.{1..253}
export no_proxy="${lan%,},${service%,},${pool%,},127.0.0.1";
export NO_PROXY=$no_proxy

現在我們集群中的一切都在內部工作。 但是，當我嘗試創建一個從外部下拉圖像的 pod 時，該 pod 卡在ContainerCreating ，例如，

[gms@thalia0 ~]$ kubectl apply -f https://k8s.io/examples/admin/dns/busybox.yaml
pod/busybox created

卡在這里：

[gms@thalia0 ~]$ kubectl get pods
NAME                            READY   STATUS              RESTARTS   AGE
busybox                         0/1     ContainerCreating   0          17m

我認為這是由於從不在我們公司代理規則中提取圖像的主機/域造成的。 我們確實有規則

k8s.io
kubernetes.io
docker.io
docker.com

所以，我不確定需要添加哪些其他主機/域。

我為 busybox 做了一個描述 pod，並看到了對node.kubernetes.io引用（我正在為*.kubernetes.io一個域范圍的例外，這希望就足夠了）。

這是我從kubectl describe pods busybox ：

Volumes:
  default-token-2kfbw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-2kfbw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                  Age   From                          Message
  ----     ------                  ----  ----                          -------
  Normal   Scheduled               73s   default-scheduler             Successfully assigned default/busybox to thalia3.ahc.umn.edu
  Warning  FailedCreatePodSandBox  10s   kubelet, thalia3.ahc.umn.edu  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "6af48c5dadf6937f9747943603a3951bfaf25fe1e714cb0b0cbd4ff2d59aa918" network for pod "busybox": NetworkPlugin cni failed to set up pod "busybox_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout, failed to clean up sandbox container "6af48c5dadf6937f9747943603a3951bfaf25fe1e714cb0b0cbd4ff2d59aa918" network for pod "busybox": NetworkPlugin cni failed to teardown pod "busybox_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout]
  Normal   SandboxChanged          10s   kubelet, thalia3.ahc.umn.edu  Pod sandbox changed, it will be killed and re-created.

我認為印花布錯誤是由於以下原因造成的：

Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s

該calico和coredns豆莢似乎都達到類似的錯誤node.kubernetes.io ，所以我會認為這是由於不能夠在重啟拉下新的圖像我們的服務器。

Answer 1

您似乎誤解了一些我想在這里澄清的 Kubernetes 概念。 對node.kubernetes.io引用並不是嘗試對該域進行任何網絡調用。 這只是 Kubernetes 用來指定字符串鍵的約定。 因此，如果您不得不應用標簽、注釋或容忍度，您可以定義自己的密鑰，如subdomain.domain.tld/some-key 。

至於您遇到的 Calico 問題，它看起來像錯誤：

network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout]

是我們這里的罪魁禍首。 10.96.0.1是用於引用 Pod 中 Kubernetes API 服務器的 IP 地址。 在您的節點上運行的calico/node pod 似乎無法訪問 API 服務器。 您能否了解更多有關如何設置 Calico 的背景信息？ 你知道你正在運行什么版本的 Calico 嗎？

您的crd.projectcalico.org/v1/clusterinformations calico/node實例正在嘗試訪問crd.projectcalico.org/v1/clusterinformations資源這一crd.projectcalico.org/v1/clusterinformations告訴我它正在使用 Kubernetes 數據存儲作為其后端。 您確定您不是在嘗試以 Etcd 模式運行 Calico 嗎？

Answer 2

拉圖像似乎沒有任何問題，因為您應該看到ImagePullBackOff狀態。 （雖然這可能會在您看到的錯誤消息之后出現）

您從 pod 中看到的錯誤與它們無法在內部連接到 kube-apiserver 相關。 它看起來像是超時，因此很可能在您的默認命名空間中存在與kubernetes服務有關的內容。 您可以像這樣檢查它，例如：

$ kubectl -n default get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   2d20h

可能缺少（？）您可以隨時重新創建它：

$ cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  labels:
    component: apiserver
    provider: kubernetes
  name: kubernetes
  namespace: default
spec:
  clusterIP: 10.96.0.1
  type: ClusterIP
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
EOF

容忍基本上是說 pod 可以容忍在具有node.kubernetes.io/not-ready:NoExecute和node.kubernetes.io/unreachable:NoExecute污點的節點上進行調度，但您的錯誤看起來並不相關到那個。

Answer 3

該問題通常意味着 docker 守護進程無法響應。

如果有任何其他服務消耗更多 CPU 或 I/O，則可能會出現此問題。

在企業防火牆/代理服務器后面運行 kubernetes kubeadm 集群

問題描述

3 個解決方案

解決方案1
1 2019-04-06 03:48:09

解決方案2
0 2019-04-05 22:00:52

解決方案3
0 2019-04-06 00:20:39

在企業防火牆/代理服務器后面運行 kubernetes kubeadm 集群

問題描述

3 個解決方案

解決方案1 1 2019-04-06 03:48:09

解決方案2 0 2019-04-05 22:00:52

解決方案3 0 2019-04-06 00:20:39

解決方案1
1 2019-04-06 03:48:09

解決方案2
0 2019-04-05 22:00:52

解決方案3
0 2019-04-06 00:20:39