简体   繁体   中英

Why is azure load balancer still sending traffic to nodes after health probe down?

I have 2 Azure VM sitting behind a Standard Azure Load Balancer.

The load balancer has a healthprobe pinging every 5 seconds with HTTP on /health for each VM.

Interval is set to 5, port is set to 80 and /health, and "unhealthy threshold" is set to 2.

During deployment of an application, we set the /health-endpoint to return 503 and then wait 35 seconds to allow the load balancer to mark the instance as down, and so stop sending new traffic.

However, Load balancer does not seem to fully take the VM out of load. It still sends traffic inbound to the down instance, causing downtime for our customers.

I can see in IIS-logs that the /health-endpoint is indeed returning 503 when it should.

Any ideas whats wrong? Can it be some sort of TCP keep-alive?

Load Balancer is a pass through service which does not terminate existing TCP connections where the flow is always between the client and the VM's guest OS and application. If a backend endpoint's health probe fails, established TCP connections to this backend endpoint continue, but it will stop sending new flows to the respective unhealthy instance. This is by design to give you opportunity to gracefully shutdown from the application to avoid any unexpected and sudden termination of ongoing application workflow.

Also you may consider configuring TCP reset on idle https://docs.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-reset to reduce number of idle connections.

I got confirmation from microsoft that this is working "as intended", which makes the Azure Load Balancer a bad fit for web applications. This is the answer from Microsoft:

I was able to discuss your observation with the internal team.

They explained that the Load balancer does not currently have “Connection Draining” feature and would not terminate existing connections.

Connection Draining is available with the Application Gateway Connection Draining.

I heard this is being planning for the Load balancer also as future Road map. You could also add your voice to the request for this feature for the Load balancer by filling the feedback Form.

I would suggest you the following approach You could have to place a healthcheck.html page on each of your VM's. As long as the probe can retrieve the page, the load balancer will keep sending user requests to the VM.

When you do the deployment, simply rename the healthcheck.html to a different name such as _healthcheck.html. This will cause the probe to start receiving HTTP 404 errors and will take that machine out of the load balanced rotation.

After your deployment have been completed, rename _healthcheck.html back to healthcheck.html. The Azure LB probe will start getting HTTP 200 responses and as a result start sending requests to this VM again.

Thanks, Manu

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM