简体   繁体   中英

why am I getting error for nginx web server down or busy?

Suddenly, my Django website has stopped service over the internet. I have no idea what changed.

So, when I launch the website in browser, I am getting bunch of error messages(attached screenshot). The error is complaining about the webserver(nginx) which is hosting my website. 在此处输入图像描述

My environment:

Ubuntu 18 Gunicorn Nginx

Website hosted on AWS. (inbound/outbond rule screenshot attached) 在此处输入图像描述 在此处输入图像描述

I have checked the sudo journalctl -u nginx.service

Aug 15 04:15:39 primarySNS.schoolnskill.com systemd[1]: Starting A high performance web server and a reverse proxy server...
Aug 15 04:15:39 primarySNS.schoolnskill.com systemd[1]: Started A high performance web server and a reverse proxy server.
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: Stopping A high performance web server and a reverse proxy server...
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: Stopped A high performance web server and a reverse proxy server.
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: Starting A high performance web server and a reverse proxy server...
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: nginx.service: Failed to parse PID from file /run/nginx.pid: Invalid argument
Aug 15 04:18:53 primarySNS.schoolnskill.com systemd[1]: Started A high performance web server and a reverse proxy server.

I could see something "invalid argument" line. Not sure if that has anything to do with my situation.

I have also checked the nginx error log. its 0 bytes

-rw-r----- 1 xxx yyy 0 Aug 15 06:25 /var/log/nginx/error.log

The syslog dies have some interesting logs:

Aug 15 06:25:01 primarySNS rsyslogd:  [origin software="rsyslogd" swVersion="8.32.0" x-pid="920" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Aug 15 06:26:07 primarySNS systemd-timesyncd[600]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).
Aug 15 06:26:50 primarySNS systemd-networkd[771]: eth0: Configured
Aug 15 06:26:50 primarySNS systemd-timesyncd[600]: Network configuration changed, trying to establish connection.
Aug 15 06:26:50 primarySNS systemd-timesyncd[600]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).
Aug 15 06:35:01 primarySNS CRON[2678]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 06:45:01 primarySNS CRON[2693]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 06:50:54 primarySNS gunicorn[1432]: Not Found: /robots.txt
Aug 15 06:53:30 primarySNS gunicorn[1432]: Not Found: /profile1/
Aug 15 06:55:01 primarySNS CRON[2718]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 06:56:50 primarySNS systemd-networkd[771]: eth0: Configured
Aug 15 06:56:50 primarySNS systemd-timesyncd[600]: Network configuration changed, trying to establish connection.
Aug 15 06:56:50 primarySNS systemd-timesyncd[600]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).
Aug 15 07:05:01 primarySNS CRON[2734]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:15:01 primarySNS CRON[2750]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:17:01 primarySNS CRON[2756]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 15 07:25:01 primarySNS CRON[2769]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:26:50 primarySNS systemd-networkd[771]: eth0: Configured
Aug 15 07:26:50 primarySNS systemd-timesyncd[600]: Network configuration changed, trying to establish connection.
Aug 15 07:26:50 primarySNS systemd-timesyncd[600]: Synchronized to time server 91.189.91.157:123 (ntp.ubuntu.com).
Aug 15 07:35:01 primarySNS CRON[2802]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:41:11 primarySNS systemd-resolved[784]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 15 07:41:11 primarySNS systemd-resolved[784]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 15 07:45:01 primarySNS CRON[2852]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: 2020-08-15 07:50:43 INFO Backing off health check to every 3600 seconds for 10800 seconds.
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: 2020-08-15 07:50:43 ERROR Health ping failed with error - EC2RoleRequestError: no EC2 instance role found
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: caused by: EC2MetadataError: failed to make EC2Metadata request
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: #011status code: 404, request id:
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: caused by: <?xml version="1.0" encoding="iso-8859-1"?>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: #011"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]:  <head>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]:   <title>404 - Not Found</title>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]:  </head>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]:  <body>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]:   <h1>404 - Not Found</h1>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]:  </body>
Aug 15 07:50:43 primarySNS amazon-ssm-agent.amazon-ssm-agent[898]: </html>
Aug 15 07:52:26 primarySNS systemd-resolved[784]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 15 07:52:26 primarySNS systemd-resolved[784]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 15 07:55:01 primarySNS CRON[2870]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)

Follow up questions and its answers:

Are you using route53 as the DNS resolver?

Yes

Has your EC2 been stopped and started again, and if so, have you checked that the ip address is still the same?

Yes, but I have made sure that the new IP is updated in my Route 53

Is your ec2 in a public subnet? Can you reach google.com or 8.8.8.8 from the command line on it?

ping google.com
PING google.com (172.217.2.110) 56(84) bytes of data.
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=1 ttl=112 time=1.31 ms
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=2 ttl=112 time=1.29 ms
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=3 ttl=112 time=1.33 ms
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=4 ttl=112 time=1.33 ms
64 bytes from yyz10s05-in-f14.1e100.net (172.217.2.110): icmp_seq=5 ttl=112 time=1.34 ms

is nginx actually listening on the ec2? If you ssh to it, and curl -vvvv http://localhost/ do you actually get a response?

*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Sat, 15 Aug 2020 13:56:06 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Fri, 10 Jul 2020 11:16:00 GMT
< Connection: keep-alive
< ETag: "5f084df0-264"
< Accept-Ranges: bytes
<
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host localhost left intact

What happens when you run curl -vvvv http://(ec2.public.ip.address)/?

* Rebuilt URL to: http://<public_ip>/
*   Trying <public_ip>...
* TCP_NODELAY set
* Connected to <public_ip> (<public_ip>) port 80 (#0)
> GET / HTTP/1.1
> Host: <public_ip>
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: nginx/1.14.0 (Ubuntu)
< Date: Sat, 15 Aug 2020 13:57:13 GMT
< Content-Type: text/html
< Content-Length: 612
< Last-Modified: Fri, 10 Jul 2020 11:16:00 GMT
< Connection: keep-alive
< ETag: "5f084df0-264"
< Accept-Ranges: bytes
<
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
* Connection #0 to host <public_ip> left intact

Is your site running at the path or virtual domain you think it is? Has your nginx config perhaps changed?

I didn't make any changes to my nginx configuration.

What happens if you run curl http://169.254.169.254/latest/meta-data - do you get a response?

curl http://169.254.169.254/latest/meta-data
ami-id
ami-launch-index
ami-manifest-path
block-device-mapping/
events/
hibernation/
hostname
identity-credentials/
instance-action
instance-id
instance-life-cycle
instance-type
local-hostname
local-ipv4
mac
metrics/
network/
placement/
profile
public-hostname
public-ipv4
public-keys/
reservation-id
security-groups

Based on the comments.

I went to the OP's website url and the website has been running . Therefore, there don't seem to be any issue on the EC2 nor its settings.

It should be noted, that the site works only for HTTP , not HTTPS . Thus attempts to access it using https:// will fail. This could potentially explain why it was not reachable when tested initially.

Without more information this is hard to debug. Things to check:

  1. Are you using route53 as the DNS resolver? Has your EC2 been stopped and started again, and if so, have you checked that the ip address is still the same?

  2. Is your ec2 in a public subnet? Can you reach google.com or 8.8.8.8 from the command line on it?

  3. is nginx actually listening on the ec2? If you ssh to it, and curl -vvvv http://localhost/ do you actually get a response?

  4. What happens when you run curl -vvvv http://(ec2.public.ip.address)/ ?

  5. As above, what happens with curl -k -vvvv https://ec2.public.ip.address)/ ?

  6. Is your site running at the path or virtual domain you think it is? Has your nginx config perhaps changed?

  7. What happens if you run curl http://169.254.169.254/latest/meta-data - do you get a response?

The ssm agent timeout is curious.

As an aside, your security group egress rules are unnecessarily complex. You can remove the http, https and ssh rules because your all traffic rule overrides them anyway.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM