简体繁体 English

Azure 负载均衡器运行状况探测失败

[英]Azure load balancer health probe failure

原文 2022-05-22 03:55:03 8 1 azure-load-balancer

I have gone through this https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-custom-probe-overview but i haven't found an answer我已经浏览了这个https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-custom-probe-overview但我还没有找到答案

Problem: i have tensorflow applications running on individual VMs hosted by gunicorn + flask application.问题：我在 gunicorn + flask 应用程序托管的单个虚拟机上运行 tensorflow 应用程序。 The intention is to ensure every VM gets only one request at a time.目的是确保每个 VM 一次只能收到一个请求。 So we have configured our app in such a way that during a request being processed , if we receive another, we simply send a BUSY code back (non 200 response) ..now this fails the health probe BUT we have no idea when and how it adds this VM back to the pool since in reality , this VM was just busy and NOT in poor health..since azure LB doesn't understand application running on VMs we didn't know how else to solve this所以我们已经配置了我们的应用程序，在处理请求期间，如果我们收到另一个请求，我们只需发送一个 BUSY 代码（非 200 响应）..现在健康探测失败但我们不知道何时以及如何它将这个虚拟机添加回池中，因为实际上，这个虚拟机只是很忙，而且健康状况不佳..因为 azure LB 不了解在虚拟机上运行的应用程序，我们不知道如何解决这个问题

But we are seeing a lot of timeouts, poor utilisation of existing VMs etc when we use the above approach, prompting us to wonder if the "poor health" guys are even being recalled..azure documentation and support is really poor ..any pointers please?但是当我们使用上述方法时，我们看到很多超时、现有 VM 的利用率低等，这促使我们想知道“健康状况不佳”的人是否甚至被召回..azure 文档和支持真的很差..任何指针请？

1 个解决方案

As per the documentation here Load balancer operates on layer 4 and doesn't provide application layer gateway functionality.根据此处的文档，负载均衡器在第 4 层上运行，不提供应用层网关功能。 You could try following steps to better understand the workflow and accordingly configure your LB for better efficiency.您可以尝试以下步骤以更好地了解工作流程并相应地配置您的 LB 以提高效率。

You can try and set-up Monitoring for your load balancer and go through some of the metrics like Flow Distribution .您可以尝试为您的负载均衡器设置监控，并查看一些指标，例如Flow Distribution 。 This view can give you feedback on whether your Load Balancer configuration or traffic patterns are leading to imbalanced traffic.此视图可以为您提供有关负载均衡器配置或流量模式是否导致流量不平衡的反馈。 For example, if you have session affinity configured and a single client is making a disproportionate number of requests.例如，如果您配置了会话亲和性，并且单个客户端发出的请求数量不成比例。
As mentioned in the documentation you have shared above, if a health probe fails the particular backend is marked as unhealthy and if the next health probe is successful the backend is marked as healthy (Unhealthy threshold mentioned by you also plays a role here).如您在上面共享的文档中所述，如果运行状况探测失败，则特定后端被标记为不健康，如果下一个运行状况探测成功，则后端被标记为健康（您提到的不健康阈值也在此处起作用）。 You can try and optimize the health probe intervals in order to maximize efficiency of your VMs.您可以尝试优化运行状况探测间隔，以最大限度地提高 VM 的效率。