简体   繁体   中英

Kubernetes pods can't communicate using weave

I have a simple cluster set up with just a single master for the time being running CoreOS. I have the kubelet running using the kubelet-wrapper script from CoreOS and I'm running weave for the pod network.

The API server, controller manager, and scheduler are all running using systemd units properly with host networking.

My problem is that the pod's can't communicate with each other, with service IPs, or with internet IPs. There appears to be a network interface, route, default gateway, but always get "no route to host".

$ kubectl  run -i --tty busybox --image=busybox --generator="run-pod/v1" --overrides='{"spec": {"template": {"metadata": {"annotations": {"scheduler.alpha.kubernetes.io/tolerations": "[{\"key\":\"dedicated\",\"value\":\"master\",\"effect\":\"NoSchedule\"}]"}}}}}'
Waiting for pod default/busybox to be running, status is Pending, pod ready: false
If you don't see a command prompt, try pressing enter.
/ # route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.32.0.1       0.0.0.0         UG    0      0        0 eth0
10.32.0.0       0.0.0.0         255.240.0.0     U     0      0        0 eth0
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
16: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1410 qdisc noqueue 
    link/ether 62:d0:49:a6:f9:59 brd ff:ff:ff:ff:ff:ff
    inet 10.32.0.7/12 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::60d0:49ff:fea6:f959/64 scope link 
       valid_lft forever preferred_lft forever
/ # ping 10.32.0.6 -c 5
PING 10.32.0.6 (10.32.0.6): 56 data bytes

--- 10.32.0.6 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss
/ # wget http://10.32.0.6:80/
Connecting to 10.32.0.6:80 (10.32.0.6:80)
wget: can't connect to remote host (10.32.0.6): No route to host

Internet IPs also don't work:

/ # ping -c 5 172.217.24.132
PING 172.217.24.132 (172.217.24.132): 56 data bytes

--- 172.217.24.132 ping statistics ---
5 packets transmitted, 0 packets received, 100% packet loss
/ # wget http://172.217.24.132/
Connecting to 172.217.24.132 (172.217.24.132:80)
wget: can't connect to remote host (172.217.24.132): No route to host

My kubelet unit is as follows:

[Service]
ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/usr/bin/mkdir -p /var/log/containers

Environment="KUBELET_KUBECONFIG_ARGS=--kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --hostname-override=192.168.86.50"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.3.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_EXTRA_ARGS=--v=4"

Environment=KUBELET_VERSION=v1.4.6_coreos.0
Environment="RKT_OPTS=--volume var-log,kind=host,source=/var/log \
  --mount volume=var-log,target=/var/log \
  --volume dns,kind=host,source=/etc/resolv.conf \
  --mount volume=dns,target=/etc/resolv.conf \
  --volume cni-conf,kind=host,source=/etc/cni \
  --mount volume=cni-conf,target=/etc/cni \
  --volume cni-bin,kind=host,source=/opt/cni \
  --mount volume=cni-bin,target=/opt/cni"

ExecStart=/usr/lib/coreos/kubelet-wrapper \
  $KUBELET_KUBECONFIG_ARGS \
  $KUBELET_SYSTEM_PODS_ARGS \
  $KUBELET_NETWORK_ARGS \
  $KUBELET_DNS_ARGS \
  $KUBELET_EXTRA_ARGS
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target[Service]
ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/manifests
ExecStartPre=/usr/bin/mkdir -p /var/log/containers

Environment="KUBELET_KUBECONFIG_ARGS=--kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --hostname-override=192.168.86.50"
Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true"
Environment="KUBELET_NETWORK_ARGS=--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"
Environment="KUBELET_DNS_ARGS=--cluster-dns=10.3.0.10 --cluster-domain=cluster.local"
Environment="KUBELET_EXTRA_ARGS=--v=4"

Environment=KUBELET_VERSION=v1.4.6_coreos.0
Environment="RKT_OPTS=--volume var-log,kind=host,source=/var/log \
  --mount volume=var-log,target=/var/log \
  --volume dns,kind=host,source=/etc/resolv.conf \
  --mount volume=dns,target=/etc/resolv.conf \
  --volume cni-conf,kind=host,source=/etc/cni \
  --mount volume=cni-conf,target=/etc/cni \
  --volume cni-bin,kind=host,source=/opt/cni \
  --mount volume=cni-bin,target=/opt/cni"

ExecStart=/usr/lib/coreos/kubelet-wrapper \
  $KUBELET_KUBECONFIG_ARGS \
  $KUBELET_SYSTEM_PODS_ARGS \
  $KUBELET_NETWORK_ARGS \
  $KUBELET_DNS_ARGS \
  $KUBELET_EXTRA_ARGS
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target

I have the weave DaemonSet running in the cluster.

$ kubectl --kubeconfig=ansible/roles/kubernetes-master/admin-user/files/kubeconfig -n kube-system get daemonset
NAME               DESIRED   CURRENT   NODE-SELECTOR   AGE
kube-proxy-amd64   1         1         <none>          22h
weave-net          1         1         <none>          22h

Weave logs look like this:

$ kubectl -n kube-system logs weave-net-me1lz weave
INFO: 2016/12/19 02:19:56.125264 Command line options: map[docker-api: http-addr:127.0.0.1:6784 ipalloc-init:consensus=1 nickname:ia-master1 status-addr:0.0.0.0:6782 datapath:datapath ipalloc-range:10.32.0.0/12 name:52:b1:20:55:0c:fc no-dns:true port:6783]
INFO: 2016/12/19 02:19:56.213194 Communication between peers is unencrypted.
INFO: 2016/12/19 02:19:56.237440 Our name is 52:b1:20:55:0c:fc(ia-master1)
INFO: 2016/12/19 02:19:56.238232 Launch detected - using supplied peer list: [192.168.86.50]
INFO: 2016/12/19 02:19:56.258050 [allocator 52:b1:20:55:0c:fc] Initialising with persisted data
INFO: 2016/12/19 02:19:56.258412 Sniffing traffic on datapath (via ODP)
INFO: 2016/12/19 02:19:56.293898 ->[192.168.86.50:6783] attempting connection
INFO: 2016/12/19 02:19:56.311408 Discovered local MAC 52:b1:20:55:0c:fc
INFO: 2016/12/19 02:19:56.314972 ->[192.168.86.50:47921] connection accepted
INFO: 2016/12/19 02:19:56.370597 ->[192.168.86.50:47921|52:b1:20:55:0c:fc(ia-master1)]: connection shutting down due to error: cannot connect to ourself
INFO: 2016/12/19 02:19:56.381759 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2016/12/19 02:19:56.391405 ->[192.168.86.50:6783|52:b1:20:55:0c:fc(ia-master1)]: connection shutting down due to error: cannot connect to ourself
INFO: 2016/12/19 02:19:56.423633 Listening for metrics requests on 0.0.0.0:6782
INFO: 2016/12/19 02:19:56.990760 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.7.3-coreos-r3&os=linux&signature=1Pty%2FGagYcrEs2TwKnz6IVegmP23z5ifqrP1D9vCzyM%3D&version=1.8.2: x509: failed to load system roots and no roots provided
10.32.0.1
INFO: 2016/12/19 02:19:57.490053 Discovered local MAC 3a:5c:04:54:80:7c
INFO: 2016/12/19 02:19:57.591131 Discovered local MAC c6:1c:f5:43:f0:91
INFO: 2016/12/19 02:34:56.242774 Expired MAC c6:1c:f5:43:f0:91 at 52:b1:20:55:0c:fc(ia-master1)
INFO: 2016/12/19 03:46:29.865157 ->[192.168.86.200:49276] connection accepted
INFO: 2016/12/19 03:46:29.866767 ->[192.168.86.200:49276] connection shutting down due to error during handshake: remote protocol header not recognised: [71 69 84 32 47]
INFO: 2016/12/19 03:46:34.704116 ->[192.168.86.200:49278] connection accepted
INFO: 2016/12/19 03:46:34.754782 ->[192.168.86.200:49278] connection shutting down due to error during handshake: remote protocol header not recognised: [22 3 1 0 242]

The weave CNI plugin binaries seem to be created properly.

core@ia-master1 ~ $ ls /opt/cni/bin/
bridge  cnitool  dhcp  flannel  host-local  ipvlan  loopback  macvlan  ptp  tuning  weave-ipam  weave-net  weave-plugin-1.8.2
core@ia-master1 ~ $ ls /etc/cni/net.d/
10-weave.conf
core@ia-master1 ~ $ cat /etc/cni/net.d/10-weave.conf 
{
    "name": "weave",
    "type": "weave-net"
}

Iptables looks like this:

core@ia-master1 ~ $ sudo iptables-save
# Generated by iptables-save v1.4.21 on Mon Dec 19 04:15:14 2016
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [2:120]
:POSTROUTING ACCEPT [2:120]
:DOCKER - [0:0]
:KUBE-MARK-DROP - [0:0]
:KUBE-MARK-MASQ - [0:0]
:KUBE-NODEPORTS - [0:0]
:KUBE-POSTROUTING - [0:0]
:KUBE-SEP-AN54BNMS4EGIFEJM - [0:0]
:KUBE-SEP-BQM5WFNH2M6QPJV6 - [0:0]
:KUBE-SERVICES - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
:KUBE-SVC-NWV5X2332I4OT4T3 - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
:WEAVE - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -j WEAVE
-A DOCKER -i docker0 -j RETURN
-A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
-A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
-A KUBE-SEP-AN54BNMS4EGIFEJM -s 192.168.86.50/32 -m comment --comment "default/kubernetes:https" -j KUBE-MARK-MASQ
-A KUBE-SEP-AN54BNMS4EGIFEJM -p tcp -m comment --comment "default/kubernetes:https" -m recent --set --name KUBE-SEP-AN54BNMS4EGIFEJM --mask 255.255.255.255 --rsource -m tcp -j DNAT --to-destination 192.168.86.50:443
-A KUBE-SEP-BQM5WFNH2M6QPJV6 -s 10.32.0.6/32 -m comment --comment "default/hostnames:" -j KUBE-MARK-MASQ
-A KUBE-SEP-BQM5WFNH2M6QPJV6 -p tcp -m comment --comment "default/hostnames:" -m tcp -j DNAT --to-destination 10.32.0.6:9376
-A KUBE-SERVICES -d 10.3.0.137/32 -p tcp -m comment --comment "default/hostnames: cluster IP" -m tcp --dport 80 -j KUBE-SVC-NWV5X2332I4OT4T3
-A KUBE-SERVICES -d 10.3.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SERVICES -d 10.3.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SERVICES -d 10.3.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -m recent --rcheck --seconds 180 --reap --name KUBE-SEP-AN54BNMS4EGIFEJM --mask 255.255.255.255 --rsource -j KUBE-SEP-AN54BNMS4EGIFEJM
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https" -j KUBE-SEP-AN54BNMS4EGIFEJM
-A KUBE-SVC-NWV5X2332I4OT4T3 -m comment --comment "default/hostnames:" -j KUBE-SEP-BQM5WFNH2M6QPJV6
-A WEAVE -s 10.32.0.0/12 -d 224.0.0.0/4 -j RETURN
-A WEAVE ! -s 10.32.0.0/12 -d 10.32.0.0/12 -j MASQUERADE
-A WEAVE -s 10.32.0.0/12 ! -d 10.32.0.0/12 -j MASQUERADE
COMMIT
# Completed on Mon Dec 19 04:15:14 2016
# Generated by iptables-save v1.4.21 on Mon Dec 19 04:15:14 2016
*filter
:INPUT ACCEPT [73:57513]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [72:61109]
:DOCKER - [0:0]
:DOCKER-ISOLATION - [0:0]
:KUBE-FIREWALL - [0:0]
:KUBE-SERVICES - [0:0]
:WEAVE-NPC - [0:0]
:WEAVE-NPC-DEFAULT - [0:0]
:WEAVE-NPC-INGRESS - [0:0]
-A INPUT -j KUBE-FIREWALL
-A INPUT -d 172.17.0.1/32 -i docker0 -p tcp -m tcp --dport 6783 -j DROP
-A INPUT -d 172.17.0.1/32 -i docker0 -p udp -m udp --dport 6783 -j DROP
-A INPUT -d 172.17.0.1/32 -i docker0 -p udp -m udp --dport 6784 -j DROP
-A INPUT -i docker0 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i docker0 -p tcp -m tcp --dport 53 -j ACCEPT
-A FORWARD -i docker0 -o weave -j DROP
-A FORWARD -j DOCKER-ISOLATION
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -o weave -j WEAVE-NPC
-A FORWARD -o weave -m state --state NEW -j NFLOG --nflog-group 86
-A FORWARD -o weave -j DROP
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A OUTPUT -j KUBE-FIREWALL
-A DOCKER-ISOLATION -j RETURN
-A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
-A KUBE-SERVICES -d 10.3.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns has no endpoints" -m udp --dport 53 -j REJECT --reject-with icmp-port-unreachable
-A KUBE-SERVICES -d 10.3.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp has no endpoints" -m tcp --dport 53 -j REJECT --reject-with icmp-port-unreachable
-A WEAVE-NPC -m state --state RELATED,ESTABLISHED -j ACCEPT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-DEFAULT
-A WEAVE-NPC -m state --state NEW -j WEAVE-NPC-INGRESS
-A WEAVE-NPC-DEFAULT -m set --match-set weave-k?Z;25^M}|1s7P3|H9i;*;MhG dst -j ACCEPT
-A WEAVE-NPC-DEFAULT -m set --match-set weave-iuZcey(5DeXbzgRFs8Szo]<@p dst -j ACCEPT
COMMIT
# Completed on Mon Dec 19 04:15:14 2016

What am I doing wrong?

Maybe relevant info:

  • 192.168.86.50 is the master node's IP
  • Pod network CIDR: 10.32.0.0/12 (at least this is what it is for the master node)
  • Service CIDR: 10.3.0.0/24
  • API server cluster IP: 10.3.0.1

I've run into this myself, and it's more of a CoreOS bug I think.

CoreOS uses a network daemon for managing network interfaces, but in some cases (like here) you want to manage some interfaces yourself. And CNI is broken here on CoreOS, because networkd tries to manage the CNI interface and the Weave interfaces.

Related issues/conversations:

I would try something like this in /etc/systemd/network/50-weave.network (using CoreOS alpha):

[Match]
Name=weave datapath vxlan-* dummy*

[Network]
# I'm not sure if DHCP or IPv6AcceptRA are required here...
DHCP=no
IPv6AcceptRA=false
Unmanaged=yes

And this for /etc/systemd/network/50-cni.network :

[Match]
Name=cni*

[Network]
Unmanaged=yes

Then reboot and see if it works! (or you might want to try CoreOS stable, assuming you're on alpha now)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM