简体   繁体   English

Kubernetes AWS上的艰难方法-部署和配置cloud-controller-manager

[英]Kubernetes The hard way on AWS - Deploy and configure cloud-controller-manager

I've tested the guide Kubernetes the hard way and the adaptation for AWS Kubernetes The Hard Way - AWS . 我已经对指南Kubernetes进行了艰苦的测试, 并对 AWS Kubernetes的适应性进行了测试

Everything runs fine with the DNS addon and even the dashboard as explained here . 一切都正常运行与DNS插件,甚至作为解释的仪表板在这里

But if I create a LoadBalancer service, it doesn't work as cloud-controller-manager isn't deployed (either as master component nor daemonset). 但是,如果我创建LoadBalancer服务,则由于未部署cloud-controller-manager(既不是主组件也不是daemonset),因此无法正常工作。

I read this https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/ to get some information on how to deploy it but if I apply the changes needed (on kubelet : --cloud-provider=external) and deploy the daemonset : 我阅读了此https://kubernetes.io/docs/tasks/administer-cluster/running-cloud-controller/以获取有关如何部署它的一些信息,但是如果我应用了所需的更改(在kubelet上:--cloud-provider = external)并部署守护程序集:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    k8s-app: cloud-controller-manager
  name: cloud-controller-manager
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: cloud-controller-manager
  template:
    metadata:
      labels:
        k8s-app: cloud-controller-manager
    spec:
      serviceAccountName: cloud-controller-manager
      containers:
      - name: cloud-controller-manager
        image: k8s.gcr.io/cloud-controller-manager:v1.8.0
        command:
        - /usr/local/bin/cloud-controller-manager
        - --cloud-provider=aws
        - --leader-elect=true
        - --use-service-account-credentials
        - --allocate-node-cidrs=true
        - --configure-cloud-routes=true
        - --cluster-cidr=${CLUSTERCIRD}
      tolerations:
      - key: node.cloudprovider.kubernetes.io/uninitialized
        value: "true"
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      nodeSelector:
        node-role.kubernetes.io/master: ""

The instances (controllers and workers) have all the right roles. 实例(控制器和工作人员)具有所有正确的角色。

I can't even create a pod, the status stays "Pending"... 我什至无法创建窗格,状态保持为“待处理” ...

Do you know how to deploy cloud-controller-manager as daemonset or master component (without using kops, kubeadm,...) on a AWS cluster? 您是否知道如何在AWS集群上将cloud-controller-manager部署为守护程序集或主组件(不使用kops,kubeadm等)?

Do you know a guide that could help me with that? 您是否知道可以帮助我的指南?

Would you give a example of cloud-controller-manager daemonset configuration? 您能否举例说明cloud-controller-manager守护程序配置?

Thanks in advance 提前致谢

UPDATE UPDATE

When executing, kubectl get nodes I get a No resources found . 执行时, kubectl get nodes我得到一个No resources found

And when describing a launched pod, I get only one event : Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 28s (x2 over 28s) default-scheduler no nodes available to schedule pods 在描述启动的pod时,我只会遇到一个事件: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 28s (x2 over 28s) default-scheduler no nodes available to schedule pods

The question should be now : How to get nodes ready with cloud-controller-manager deployed for aws? 现在的问题应该是:如何为AWS部署云控制器管理器以准备好节点?

As samhain1138 mentioned, your cluster does not look healthy to install anything. 如samhain1138所述,您的集群安装任何东西看起来都不健康。 In simple cases, it could be fixed, but sometimes it is better to reinstall everything. 在简单的情况下,可以将其修复,但有时最好重新安装所有内容。

Let's try to investigate the problem. 让我们尝试调查问题。
First of all, check your master node state. 首先,检查您的主节点状态。 Usually, it means that you should have a kubelet service running. 通常,这意味着您应该正在运行kubelet服务。
Check the kubelet log for errors: 检查kubelet日志中是否有错误:

$ journalctl -u kubelet

Next, check the state of your static pods. 接下来,检查静态容器的状态。 You can find a list of them in the /etc/kubernetes/manifets directory: 您可以在/etc/kubernetes/manifets目录中找到它们的列表:

$ ls /etc/kubernetes/manifests

etcd.yaml  
kube-apiserver.yaml  
kube-controller-manager.yaml  
kube-scheduler.yaml

$ docker ps

CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS               NAMES
5cbdc1c13c25        8a7739f672b4           "/sidecar --v=2 --..."   2 weeks ago         Up 2 weeks                              k8s_sidecar_kube-dns-86c47599bd-l7d6m_kube-system_...
bd96ffafdfa6        6816817d9dce           "/dnsmasq-nanny -v..."   2 weeks ago         Up 2 weeks                              k8s_dnsmasq_kube-dns-86c47599bd-l7d6m_kube-system_...
69931b5b4cf9        55ffe31ac578           "/kube-dns --domai..."   2 weeks ago         Up 2 weeks                              k8s_kubedns_kube-dns-86c47599bd-l7d6m_kube-system_...
60885aeffc05        k8s.gcr.io/pause:3.1   "/pause"                 2 weeks ago         Up 2 weeks                              k8s_POD_kube-dns-86c47599bd-l7d6m_kube-system_...
93144593660c        9f355e076ea7           "/install-cni.sh"        2 weeks ago         Up 2 weeks                              k8s_install-cni_calico-node-nxljq_kube-system_...
b55f57529671        7eca10056c8e           "start_runit"            2 weeks ago         Up 2 weeks                              k8s_calico-node_calico-node-nxljq_kube-system_...
d8767b9c07c8        46a3cd725628           "/usr/local/bin/ku..."   2 weeks ago         Up 2 weeks                              k8s_kube-proxy_kube-proxy-lf8gd_kube-system_...
f924cefb953f        k8s.gcr.io/pause:3.1   "/pause"                 2 weeks ago         Up 2 weeks                              k8s_POD_calico-node-nxljq_kube-system_...
09ceddabdeb9        k8s.gcr.io/pause:3.1   "/pause"                 2 weeks ago         Up 2 weeks                              k8s_POD_kube-proxy-lf8gd_kube-system_...
9fc90839bb6f        821507941e9c           "kube-apiserver --..."   2 weeks ago         Up 2 weeks                              k8s_kube-apiserver_kube-apiserver-kube-master_kube-system_...
8ea410ce00a6        b8df3b177be2           "etcd --advertise-..."   2 weeks ago         Up 2 weeks                              k8s_etcd_etcd-kube-master_kube-system_...
dd7f9b381e4f        38521457c799           "kube-controller-m..."   2 weeks ago         Up 2 weeks                              k8s_kube-controller-manager_kube-controller-manager-kube-master_kube-system_...
f6681365bea8        37a1403e6c1a           "kube-scheduler --..."   2 weeks ago         Up 2 weeks                              k8s_kube-scheduler_kube-scheduler-kube-master_kube-system_...
0638e47ec57e        k8s.gcr.io/pause:3.1   "/pause"                 2 weeks ago         Up 2 weeks                              k8s_POD_etcd-kube-master_kube-system_...
5bbe35abb3a3        k8s.gcr.io/pause:3.1   "/pause"                 2 weeks ago         Up 2 weeks                              k8s_POD_kube-controller-manager-kube-master_kube-system_...
2dc6ee716bb4        k8s.gcr.io/pause:3.1   "/pause"                 2 weeks ago         Up 2 weeks                              k8s_POD_kube-scheduler-kube-master_kube-system_...
b15dfc9f089a        k8s.gcr.io/pause:3.1   "/pause"                 2 weeks ago         Up 2 weeks                              k8s_POD_kube-apiserver-kube-master_kube-system_...

You can see the detailed description of any pod's container using the command: 您可以使用以下命令查看任何容器容器的详细说明:

$ docker inspect <container_id>

Or check the logs: 或查看日志:

$ docker logs <container_id>

This should be enough to understand what to do next, either try to fix the cluster or tear down everything and start from the beginning. 这应该足以了解下一步该如何做,要么尝试修复群集,要么拆除所有内容并从头开始。

To simplify the process of provisioning Kubernetes cluster, you could use kubeadm as follows: 为了简化配置Kubernetes集群的过程,您可以按以下方式使用kubeadm

# This instruction is for ubuntu VMs, if you use CentOS, the commands will be
# slightly different.

### These steps are the same for the master and the worker nodes
# become root
$ sudo su

# add repository and keys
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

$ cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF

# install components
$ apt-get update
$ apt-get -y install ebtables ethtool docker.io apt-transport-https kubelet kubeadm kubectl

# adjust sysctl settings
$ cat <<EOF >>/etc/ufw/sysctl.conf
net/ipv4/ip_forward = 1
net/bridge/bridge-nf-call-ip6tables = 1
net/bridge/bridge-nf-call-iptables = 1
net/bridge/bridge-nf-call-arptables = 1
EOF

$ sysctl --system

### Next steps are for the master node only.

# Create Kubernetes cluster
$ kubeadm init --pod-network-cidr=192.168.0.0/16
or if you want to use older KubeDNS instead of CoreDNS:
$ kubeadm init --pod-network-cidr=192.168.0.0/16 --feature-gates=CoreDNS=false

# Configure kubectl
$ mkdir -p $HOME/.kube
$ cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ chown $(id -u):$(id -g) $HOME/.kube/config

# install Calico network
$ kubectl apply -f https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml
# or install Flannel (not both)
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

# Untaint master or/and join other nodes:
$ kubectl taint nodes --all node-role.kubernetes.io/master-

# run on master if you forgot the join command:
$ kubeadm token create --print-join-command

# run command printed on the previous step on the worker node to join it to the existing cluster.

# At this point you should have ready to user Kubernetes cluster.
$ kubectl get nodes -o wide
$ kubectl get pods,svc,deployments,daemonsets --all-namespaces

After recovering the cluster, could you try to install cloud-controller-manager again and share the results? 恢复群集后,您可以尝试再次安装cloud-controller-manager并共享结果吗?

Forget about cloud-controller-manager, you don't seem to have a functioning Kubernetes cluster to run it on!!! 忘了云控制器经理,您似乎没有一个可运行的Kubernetes集群来运行它!!!
Kubernetes tells you exactly that, but you ignored it... Kubernetes完全告诉您,但是您忽略了它...

No offense but maybe if you aren't experienced with Kubernetes, you shouldn't try and follow a guide called Kubernetes The Hard Way (you failed, and you haven't provided any information for me to point out exactly why/how), but use kops or kubeadm instead? 没有冒犯,但也许如果您没有Kubernetes的经验,则不应尝试遵循名为Kubernetes The Hard Way的指南(您失败了,并且没有为我提供任何信息以指出确切的原因/方式),但改用kops或kubeadm?

I had the same issue trying to set cloud-provider with GCE. 我在尝试使用GCE设置cloud-provider时遇到了同样的问题。 I solved the problem by adding the following flags to kube-apiserver.service , kubelet.service and kube-controller-manager.service . 我通过在kube-apiserver.servicekubelet.servicekube-controller-manager.service添加以下标志来解决该问题。

--cloud-provider=gce \
--cloud-config=/var/lib/gce.conf \

The gce.conf file was based off the json Key file generated from Google IAM service account but in Gcfg format. gce.conf文件基于Google IAM服务帐户生成的json密钥文件,但格式为Gcfg I'm sure AWS has something similar. 我确定AWS有类似的东西。 The format looks like this: 格式如下:

[Global]
type = xxx
project-id = xxx
private-key-id = xxx
private-key = xxx
client-email = xxx
client-id = xxx
auth-uri = xxx
token-uri = xxx
auth-provider-x509-cert-url = xxx
client-x509-cert-url = xxx

For more info see K8s documentation on cloud-provider. 有关更多信息,请参阅有关云提供程序的K8s文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM