简体   繁体   中英

NGINX location block regex and proxy pass

I hope all of you are well.

I am a beginner with NGINX and I am trying to understand the following NGINX config file block. I would be really grateful if someone could help me understand this block.

location ~ ^/search/google(/.*)?$ {
  set $proxy_uri $1$is_args$args;
  proxy_pass http://google.com$proxy_uri;
}

From the following SO article ( https://stackoverflow.com/a/59846239 ), I understand that:

  • For the location ~ ^/search/google(/.*)?$

    • ~ means that it will perform regex search (case sensitive)
    • ^/search/google means that the route should start with /search/google (eg http://<ip or domain>/search/google . Is there any difference if we have trailing / at the end (eg http://<ip or domain>/search/google/ instead of http://<ip or domain>/search/google
    • (/.*)?$ this is the part that I'm a bit confused .
      • why use () group in this case? What's the common use case of using group?
      • why use ? in this case? Isn't .* already includes any char zero or more, why do we still need ?
      • Can we simply remove () and ? such as /search/google/.*$ to get the same behavior as the original one?
  • set $proxy_uri $1$is_args$args;

    • I understand that we are setting a user-defined var called proxy_uri
    • what will $1 be replaced with, sometimes someone also include $2 and so on?
    • I think $is_args$args means that if there's a query string (ie http://<ip or domain>/search/google?fruit=apple , $is_args$args will be replaced with ?fruit=apple
  • proxy_pass http://google.com$proxy_uri

    • I would assume it just redirects the user to http://google.com$proxy_uri ??? same as http redirect 301???

Thank you very much in advance!

Being a non-native English speaker, I thought someone will answer your question with a more perfect English than mine, but since no one did it for the last five days, I would try to do it by myself.

~ means that it will perform regex search (case sensitive)

I think the more correct term is "perform matching against a regex pattern".

^/search/google means that the route should start with /search/google (eg http://<ip or domain>/search/google . Is there any difference if we have trailing / at the end (eg http://<ip or domain>/search/google/ instead of http://<ip or domain>/search/google

Will be answered below.

why use () group in this case? What's the common use case of using group?

This is a numbered capturing group . Content of the string matched this group can be referenced later as $1 . Second numbered capture group, being present in the regex pattern, can be referenced as $2 and so on. There is also the named capture groups exists, when you can use your own variable name instead of $1 , $2 , etc. A good example of using named capture groups is given at this ServerFault thread.

BTW the answer you are referencing mentions numbered capture groups (but not the named capture groups).

why use ? in this case? Isn't .* already includes any char zero or more, why do we still need ?

Did you notice our capture group is (/.*) , not the (.*) ? This way it will match /search/google/<any suffix> but not the /search/googles etc. A question sign made this capturing group optional ( /search/google will match our regex pattern too).

Can we simply remove () and ? such as /search/google/.*$ to get the same behavior as the original one?

No, as we need that $1 value later. If you understand all the above information correctly, you should see it can be /<any suffix> or an empty string.

what will $1 be replaced with, sometimes someone also include $2 and so on?

Already answered.

I think $is_args$args means that if there's a query string (ie http://<ip or domain>/search/google?fruit=apple , $is_args$args will be replaced with ?fruit=apple

Yes, exactly.

I would assume it just redirects the user to http://google.com$proxy_uri ??? same as http redirect 301???

Totally wrong. The difference is briefly described here although that answer doesn't mention you can additionally modify the response before sending it to the client (for example, using the sub_filter module).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM