k8s、RabbitMQ 和 Peer Discovery

Question

We are trying to run an instance of the RabbitMQ chart with Helm from the helm/charts/stable/rabbit project.我们正在尝试使用helm/charts/stable/rabbit项目中的 Helm 运行 RabbitMQ 图表的实例。 I had it running perfect but then I had to restart k8s for some maintenance.我让它运行得很好，但后来我不得不重新启动 k8s 进行一些维护。 Now we are completely unable to launch the RabbitMQ chart in any way shape or form.现在我们完全无法以任何形式或形式启动 RabbitMQ 图表。 I am not even trying to run the chart with any variables, ie just the default values.我什至没有尝试使用任何变量运行图表，即只是默认值。

Here is all I am doing:这是我正在做的所有事情：

helm install stable/rabbitmq

I have confirmed I can simply run the default right on my local k8s which I'm running with Docker for Desktop.我已经确认我可以简单地在我使用 Docker for Desktop 运行的本地 k8s 上运行默认权限。 When we run the rabbit chart on our shared k8s the exact same way as on desktop and what we did before the restart, the following error is thrown:当我们在我们的共享 k8s 上以与在桌面上完全相同的方式以及我们在重新启动之前所做的相同的方式在我们的共享 k8s 上运行兔子图表时，会抛出以下错误：

Failed to get nodes from k8s - 503

I have also posted an issue on the Helm charts repo as well.我还在 Helm charts repo 上发布了一个问题。 Click here to see the issue on Github.单击此处查看 Github 上的问题。

We are suspecting the DNS but are unable to confirm anything yet.我们怀疑 DNS，但目前无法确认任何信息。 What is very frustrating is after the restart every single other chart we installed restarted perfectly except Rabbit which now will not start at all.非常令人沮丧的是，在重新启动后，我们安装的所有其他图表都完美地重新启动，除了现在根本无法启动的 Rabbit。

Anyone know what I could do to get Rabbits peer discovery to work?任何人都知道我可以做些什么来让 Rabbits 同行发现工作？ Anyone seen issue like this after restarting k8s?有人在重启 k8s 后看到过这样的问题吗？

Answer 1

So I actually got rabbit to run.所以我实际上让兔子跑了。 Turns out my issue was the k8s peer discovery could not connect over the default port 443 and I had to use the external port 6443 because kubernetes.default.svc.cluster.local resolved to the public port and could not find the internal, so yeah our config is messed up too.原来我的问题是 k8s 对等发现无法通过默认端口 443 连接，我不得不使用外部端口 6443，因为kubernetes.default.svc.cluster.local解析为公共端口而找不到内部端口，所以是的我们的配置也搞砸了。

It took me a while to realize the variable below was not overriding when I overrode it with helm install . -f server-values.yaml当我用helm install . -f server-values.yaml覆盖它时，我花了一段时间才意识到下面的变量没有被覆盖helm install . -f server-values.yaml helm install . -f server-values.yaml . helm install . -f server-values.yaml 。

rabbitmq:
  configuration: |-
    ## Clustering
    cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
    cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
    cluster_formation.k8s.port = 6443
    cluster_formation.node_cleanup.interval = 10
    cluster_formation.node_cleanup.only_log_warning = true
    cluster_partition_handling = autoheal
    # queue master locator
    queue_master_locator=min-masters
    # enable guest user
    loopback_users.guest = false

I had to add cluster_formation.k8s.port = 6443 to the main values.yaml file instead of my own.我不得不将cluster_formation.k8s.port = 6443添加到主values.yaml文件而不是我自己的文件。 Once the port was changed specifically in the values.yaml , rabbit started right up.一旦在values.yaml专门更改了端口，rabbit 就会立即启动。

Answer 2

I'm wondering what is the reason of using rabbit_peer_discovery_k8s plugin, if values.yaml defaults to 1 replicas (your manifest file does not override this setting) ?我想知道使用 rabbit_peer_discovery_k8s 插件的原因是什么，如果 values.yaml 默认为 1 个副本（您的清单文件未覆盖此设置）？

I was trying to reproduce your issue with given by you override values (dev-server.yaml), as per the details in your github issue #10811, but I somewhat failed.我试图根据您的github 问题#10811 中的详细信息，使用您提供的覆盖值 (dev-server.yaml) 重现您的问题，但我有点失败了。 Here are my observations:以下是我的观察：

If to install RabbitMQ chart with your custom values, my rabbitmq-dev-default-0 pod gets stuck in CrashLoopBackOff state.如果要使用自定义值安装 RabbitMQ 图表，我的rabbitmq-dev-default-0 pod 会卡在 CrashLoopBackOff 状态。 It`s quite hard to troubleshoot it further for me as bitnami`s rabbitmq image containers, used by this rabbitmq Helm chart, are shipped with non-root account.由于此rabbitmq Helm 图表使用的bitnami 的rabbitmq 图像容器与非root 帐户一起提供，因此我很难对其进行进一步的故障排除。
On the other hand if rabbitmq chart is installed on my Kubernetes cluster (v1.13.2) in simplest form:另一方面，如果rabbitmq chart以最简单的形式安装在我的Kubernetes集群（v1.13.2）上：

helm install stable/rabbitmq掌舵安装稳定/rabbitmq

I observe similar issue then.我观察到类似的问题。 I mean rabbitmq server survives a simulated VM restart of all cluster nodes (including master), but I cannot connect to it from outside:我的意思是 rabbitmq 服务器在所有集群节点（包括主节点）的模拟 VM 重启后仍然存在，但我无法从外部连接到它：

Post VM restart, I`m getting following error from my python mqclient:虚拟机重启后，我的 python mqclient 出现以下错误：

socket.gaierror: [Errno -2] Name or service not known

Few remarks here:这里有几点说明：

Yes, I did port(s)-forward as per instructions on "helm status " command:是的，我按照“helm status”命令的说明进行了端口转发：

The readiness probe works fine:就绪探针工作正常：

curl -sS -f --user user:<my_pwd> 127.0.0.1:15672/api/healthchecks/node
{"status":"ok"}

rabbitmqctl to rabbitmq-server connectivity from inside the container works fine too:从容器内部rabbitmqctl到rabbitmq-server的连接也可以正常工作：

kubectl exec rabbitmq-dev-default-0 -- rabbitmqctl list_queues
warning: the VM is running with native name encoding of latin1 which may cause Elixir to malfunction as it expects utf8. Please ensure your locale is set to UTF-8 (which can be verified by running "locale" in your shell)
Timeout: 60.0 seconds ...
Listing queues for vhost / ...
name    messages
hello   11

From the moment I used kubectl port-forward to pod instead service, connectivity to rabbitmq server is restored:从我使用 kubectl port-forward 到 pod 代替 service 的那一刻起，到rabbitmq 服务器的连接就恢复了：

kubectl port-forward --namespace default pod/rabbitmq-dev-default-0 5672:5672

$ python send.py
 [x] Sent 'Hello World!'

k8s、RabbitMQ 和 Peer Discovery

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-01-25 13:56:50

解决方案2
0

k8s、RabbitMQ 和 Peer Discovery

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-01-25 13:56:50

解决方案2 0

解决方案1
1 已采纳 2019-01-25 13:56:50

解决方案2
0