简体   繁体   中英

nginx reverse proxy not detecting dropped load balancer

We have the following config for our reverse proxy:

location ~ ^/stuff/([^/]*)/stuff(.*)$ {
    set $sometoken $1;
    set $some_detokener "foo";
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Authorization "Basic $do_token_decoding";
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_redirect https://place/ https://place_with_token/$1/;
    proxy_redirect http://place/ http://place_with_token/$1/;
    resolver 10.0.0.2 valid=10s;
    set $backend https://real_storage$2;
    proxy_pass $backend;
}

Now, all of this works.... until the real_storage rotates a server. For example, say real_storage comes from foo.com. This is a load balancer which directs to two servers: 1.1.1.1 and 1.1.1.2. Now, 1.1.1.1 is removed and replaced with 1.1.1.3. However, nginx continues to try 1.1.1.1, resulting in:

epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while connecting to upstream, client: ..., server: ..., request: "GET... HTTP/1.1", upstream: "https:// 1.1.1.1 :443/...", host: "..."

Note that the upstream is the old server, shown by a previous log:

[debug] 1888#1888: *570837 connect to 1.1.1.1:443, fd:60 #570841

Is this something misconfigured on our side or the host for our real_storage ?

*The best I could find that sounds even close to my issue is https://mailman.nginx.org/pipermail/nginx/2013-March/038119.html ...

Further Details

We added proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504; and it still failed. I am now beginning to suspect that since it is two ELBs (ours and theirs) then the resolver we are using is the problem - since it is amazon specific (per https://serverfault.com/a/929517/443939)...and amazon still sees it as valid, but it won't resolve externally (our server trying to hit theirs..)

I have removed the resolver altogether from one configuration and will see where that goes. We have not been able to reproduce this using internal servers, so we must rely on waiting for the third party servers to cycle (about once per week).

I'm a bit uncertain about this resolver being the issue only because a restart of nginx will solve the problem and get the latest IP pair:/

Is it possible that I have to set the dns variable without the https?:

    set $backend real_storage$2;
    proxy_pass https://$backend;

I know that you have to use a variable or else the re-resolve won't happen, but maybe it is very specific which part of the variable - as I have only ever seen it set up as above in my queries....but no reason was ever given...I'll set that up on a 2nd server and see what happens...

And for my 3rd server I am trying this comment and moving the set outside of location. Of course if anybody else has a concrete idea then I'm open to changing my testing for this go round:D

set $rootbackend https://real_storage;
location ~ ^/stuff/([^/]*)/stuff(.*)$ {
    set $backend $rootbackend$2;
    proxy_pass $backend;
}

Note that I have to set it inside because it uses a dynamic variable, though.

There's nothing special about using variables within http://nginx.org/r/proxy_pass — any variable use will make nginx involve the resolver on each request (if not found in a server group — perhaps you have a clash?), you can even get rid of $backend if you're already using $2 in there.

As to interpreting the error message — you have to figure out whether this happens because the existing connections get dropped, or whether it's because nginx is still trying to connect to the old addresses.

You might also want to look into lowering the _time values within http://nginx.org/en/docs/http/ngx_http_proxy_module.html ; they all appear to be set at 60s, which may be too long for your use-case:

I'm not surprised that you're not able to reproduce this issue, because there doesn't seem to be anything wrong with your existing configuration; perhaps the problem manifested itself in an earlier revision?

As it was correctly noted by @cnst , using a variable in proxy_pass makes nginx resolve address of real_storage for every request, but there are further details:

Before version 1.1.9 nginx used to cache DNS answers for 5 minutes .

After version 1.1.9 nginx caches DNS answers for a duration equal to their TTL, and the default TTL of Amazon ELB is 60 seconds .

So it is pretty legal that after rotation nginx keeps using old address for some time. As per documentation, the expiration time of DNS cache can be overridden:

resolver 127.0.0.1 [::1]:5353 valid=10s;

or

resolver 127.0.0.1 ipv6=off valid=10s;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM