简体   繁体   English

EC2实例在loadbalancer中无法使用

[英]EC2 instance is out of service in loadbalancer

I have an EC2 instance up and running. 我有一个EC2实例正在运行。 I have a load balancer where its associated with EC2 instance. 我有一个负载均衡器,它与EC2实例相关联。

Ping Target         : HTTP:3001/healthCheck
Timeout             : 5 seconds
Interval            : 24 seconds
Unhealthy threshold : 2
Healthy threshold   : 10

在此输入图像描述 Now the instance is shown as OutofService. 现在,实例显示为OutofService。 I even tried changing listening ports and all. 我甚至尝试改变监听端口和所有。 Things were working until,rebooted my EC2 instance. 事情一直在努力,直到重新启动我的EC2实例。 Any help would be higly appreciated. 任何帮助将非常感激。

Just for the info: I have rails app running at port 3001 and I have one listenser for HTTP:80(loadbalancer) to HTTP:3001. 仅供参考:我有在端口3001运行的rails应用程序,我有一个HTTP的监听器:80(loadbalancer)到HTTP:3001。

I also have checked the working app through ssh in the terminal. 我也通过终端中的ssh检查了工作应用程序。

Suggestion#1: 建议1:

If the current state of some or all your instances is OutOfService and the description field displays the message that the Instance has failed at least the Unhealthy Threshold number of health checks consecutively, the instances have failed the load balancer health check. 如果您的部分或全部实例的当前状态为OutOfService,并且说明字段显示实例至少连续运行不健康阈值运行状况检查失败的消息,则实例的负载均衡器运行状况检查失败。

The following are the issues to look for, the potential causes, and the steps you can take to resolve the issues by following this link: Troubleshoot a Classic Load Balancer: Health Checks 以下是要查找的问题,可能的原因以及通过以下链接解决问题可采取的步骤: 对经典负载均衡器进行故障排除:运行状况检查

Suggestion#2: 建议2:

chrisa_pm has given some advice for this issue: chrisa_pm对此问题提出了一些建议:

If you can confirm that your EC2 instance is reachable, you can remove it from your Load Balancer and add it back again. 如果您可以确认您的EC2实例是可访问的,则可以将其从Load Balancer中删除并再次添加。 The Load Balancer will recognize it after a few minutes though. 尽管如此,Load Balancer会在几分钟后识别它。

Keep in mind that you need to confirm the health as it is set in your Health Check configuration: 请记住,您需要确认健康检查配置中设置的健康状况:

  1. For HTTP:80 you need to specify a page that is actually reachable (like index.html) 对于HTTP:80,您需要指定实际可访问的页面(如index.html)
  2. For TCP:80 it will only be needed access to the 80 TCP port. 对于TCP:80,只需要访问80 TCP端口。

Suggestion#3: 建议三:

qh2 has make a solution by the following way qh2通过以下方式解决问题

Create a service in startup to deregister and register again your instance. 在启动时创建服务以取消注册并再次注册您的实例。

Example: file awsloadbalancer 示例:文件awsloadbalancer

#!/bin/sh
chkconfig: 2345 95 20

When a isntance is stopped a load balancer is missed. 当意图停止时,错过了负载平衡器。 this rebuild load balancer 这个重建负载均衡器

case "$1" in
start)
aws --region eu-west-1 elb deregister-instances-from-load-balancer --load-balancer-name test --instances i-3c339b7c
aws --region eu-west-1 elb register-instances-with-load-balancer --load-balancer-name test --instances i-3c339b7c
;;
stop)
echo "stopping aws instances"
;;
restart)
echo "Restarting aws, nothing to do"
;;
*)
echo "Usage: $0 {start|stop|restart}"
exit 1
;;
esac

create file in /etc/init.d/ after that, register as service. 之后在/etc/init.d/创建文件,注册为服务。

Suggestion#4: 建议4:

Kenneth Snyder also solved the issue for specific ELB issue. Kenneth Snyder还针对特定的ELB问题解决了这个问题。

I also had similar issue but I was able to fix that. 我也有类似的问题,但我能够解决这个问题。

I had created a security group for ELB which accepts request on port 80 and forward to EC2 on port 80. The security group that was earlier created for EC2 has also inbound rules for port 80 and RDP. 我为ELB创建了一个安全组,它接受端口80上的请求并转发到端口80上的EC2。之前为EC2创建的安全组还具有端口80和RDP的入站规则。

Still the instances were showing as OutOfService under ELB. 实际情况仍然显示为ELB下的OutOfService。 Later i tried to add another inbound rule in the EC2's security group to allow port 80 for the SG that was created for ELB. 后来我试图在EC2的安全组中添加另一个入站规则,以允许为ELB创建的SG的端口80。 and that worked. 那很有效。

I guess it requires the ELB SG to be allowed in the rules created for individual instance's SG. 我想它需要在为单个实例的SG创建的规则中允许ELB SG。 Hope that helps. 希望有所帮助。

Resource Link: 资源链接:

https://forums.aws.amazon.com/thread.jspa?messageID=733153 https://forums.aws.amazon.com/thread.jspa?messageID=733153

Did you provide a health check endpoint and specified it in the EC2 console? 您是否提供了健康检查端点并在EC2控制台中指定了它? Something like: 就像是:

健康检查快照

Note the port 80 and a valid route. 请注意端口80和有效路由。 You probably didn't set the port 3001 in your nginx / apache config 您可能没有在nginx / apache配置中设置端口3001

In the rails app, create an action like so: 在rails应用程序中,创建一个这样的动作:

class HealthCheckController < ActionController::Base
  def ping
    head :ok
  end
end

and route: 和路线:

get 'health_check/ping'

The AWS load balancer will ping his endpoint and if the response is a 200 OK enough times (as per the Healthy threshold , it will deem the instance as "Healthy". AWS负载均衡器将ping他的端点,如果响应足够200 OK次(根据Healthy threshold ,它会将实例视为“健康”)。

I see some issues with your ELB health check config. 我看到你的ELB健康检查配置存在一些问题。 Right now, you have configured the health check to check an instance 10times every 24seconds before ELB will send requests. 现在,您已经配置了运行状况检查,以便在ELB发送请求之前每24秒检查一次实例10次。 Hence, it takes 因此,它需要

24seconds x 10 = 240secs # 4mins after reboot

assuming your Unicorn starts faster & doesn't die after its running, you should reduce the health check internal & healthy threshold. 假设您的Unicorn启动速度更快并且在运行后不会死亡,您应该减少健康检查内部和健康阈值。

  • Reduce the Interval to 3-5seconds. 将间隔减少到3-5秒。
  • Reduce Healthy threshold to 2-5times. 将健康阈值降低至2-5倍。

The above should help ELB to make instance "in service" faster. 以上内容应该有助于ELB更快地使用“服务”实例。

This is assuming that your server config is properly setup to listen on /healthcheck port 3001 from external hosts. 这假设您的服务器配置已正确设置为从外部主机侦听/healthcheck端口3001 Please check your firewall/security groups/server config if that is not true. 如果不是这样,请检查您的防火墙/安全组/服务器配置。

The problem was, after rebooting the instance aws assigns new ip to the EC2 which I didn't notice. 问题是,在重新启动实例后,aws将新ip分配给EC2,我没有注意到。

And I was logging in thorough ssh to the old ec2 instance. 我正在使用彻底的ssh到旧的ec2实例。 And hence curl was never failing too. 因此卷曲也从未失败过。

(I am quite curious why this ip address still active and when I last checked it was active even after 15 days) (我很好奇为什么这个IP地址仍然有效,当我上次检查时它甚至在15天后仍然有效)

Nevertheless great check points(in general) by SkyWalker . 然而, SkyWalker提供了很好的检查点(一般而言)。

Finally what I had to: 最后,我不得不:

With new ip, my pem file also got screwed up. 有了新的ip,我的pem文件也搞砸了。 Hence created new instance, new pem file, adjusted load balancer to point to this instance and security groups accorindgly. 因此创建了新实例,新的pem文件,调整后的负载均衡器指向此实例和安全组accorindgly。

PS:I couldn't be any more stupid. PS:我不能再愚蠢了。

确保appserver的安全组允许ELB安全组访问您在运行状况检查中指定的端口的运行状况检查端点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM