How to set tld_length dynamically on a Rails app with puma (thread safe)

Question

We have a Rails app that responds to multiple TLDs, including subdomains. One of those domains is a .co.uk. domain, therefore the TLD length in that case is 2 (eg: ourapp.es , ourapp.co.uk , api.ourapp.es , api.ourapp.co.uk .

In order to dynamically change the TLD length we've using this Rack middleware :

class Rack::TldLength

  def initialize(app, host_pattern, host_tld_length)
    @app = app
    @host_pattern = Regexp.new(host_pattern)
    @host_tld_length = host_tld_length
  end

  def call(env)
    original_tld_length = tld_length

    request = Rack::Request.new(env)

    set_tld_length(@host_tld_length) if request.host =~ @host_pattern

    @app.call(env)
  ensure
    set_tld_length(original_tld_length)
  end

  private

  def tld_length
    ActionDispatch::Http::URL.tld_length
  end
  def set_tld_length(length)
    ActionDispatch::Http::URL.tld_length = length
  end
end

This has been working so far until we decided to migrate from Unicorn to puma . With Unicorn each request would go to a different unicorn worker (process) and there was no problem. However with puma each request can be processed by a different thread. We suspect that changing the value ActionDispatch::Http::URL.tld_length is not thread safe, but we're struggling to find an alternative to this.

It seems that the Rails routing (where we define routes with subdomain constraints) depends on setting the ActionDispatch::Http::URL.tld_length properly.

Is there any workaround to keep the concurrency offered by having multiple threads while still being able to handle multiple domains with different TLD lengths?

Answer 1

You state that:

It seems that the Rails routing (where we define routes with subdomain constraints) depends on setting the ActionDispatch::Http::URL.tld_length properly.

It seems to me that the easiest way is to normalize the "HOST" parameter in the env to allow for all host names to behave equally.

ie

# Place this middleware at the top of the chain, before any Rails middleware.
class Rack::FixedHost

  # a host_pattern can be: /(foo.com|foo.co.uk|foo.bor.co.uk)$/
  def initialize(app, host_pattern, normalized_host)
    @app = app
    @host_pattern = Regexp.new(host_pattern)
    @normalized_host = normalized_host
  end

  def call(env)
    env[:ORIGINAL_HOST] = env['HTTP_HOST'.freeze] || @normalized_host
    env[:ORIGINAL_DOMAIN] = env[:ORIGINAL_HOST].match(@host_pattern).to_a[0] || @normalized_host
    env['HTTP_HOST'.freeze] = env[:ORIGINAL_HOST].to_s.sub(@host_pattern, @normalized_host)
    @app.call(env)
  end
end

To clarify: normalizing a host means that it always has the same host name postfix, regardless of the original postfix, allowing for easier subdomain extraction.

ie, for sub.foo.com , sub.foo.co.uk and sub.foo.bor.co.uk the normalized_host will always be sub.foo.com .

In this example, sub is easily extracted after the different host variation ( foo.com , foo.co.uk and foo.bor.co.uk ) have all been normalized to the single "normalized" variation ( foo.com ).

By default, methods such as url_for will construct a relative URL, so the actual host name isn't important.

However, if you use url_for or other functions to provide a complete URL, you might consider using an explicit :host to direct traffic to the regional host name you're using. ie:

url_for(action: 'index', host: "admin.#{request.env[:ORIGINAL_DOMAIN]}")

This, of course could be made even more powerful by extracting the original domain name before normalizing the host, allowing you to route to specific subdomains while keeping the regional domain.

Note (my original observation / answer):

Your code stores the TLD length of each request in a shared global variable.

When two parallel requests arrive, on two different threads, it is a matter of chance to know which TLD length will be used (the last one written, most probably, if no data "shearing" occurs).

A thread-safe approach will store the information in the env variable, allowing each request it's own TLD length.

The following example will NOT work, because I don't handle TLD lengths and have no idea how to calculate them... but it shows the use of the env as a thread-safe per-request storage.

class Rack::TldLength

  def initialize(app, host_pattern, host_tld_length)
    @app = app
    @host_pattern = Regexp.new(host_pattern)
    @default_tld_length = host_tld_length
  end

  def call(env)
    # ActionDispatch::Http::URL.tld_length = @default_tld_length if(env["HTTP_HOST".freeze].to_s =~ @host_pattern)
    env[:hosts_tld] = (env["HTTP_HOST".freeze].to_s =~ @host_pattern) ? @default_tld_length : ActionDispatch::Http::URL.tld_length
    @app.call(env)
  end
end

Answer 2

ActionDispatch::Http::URL stores tld_length as a module variable , which is to say as a single global variable for your whole application. There is no way to make that thread safe. I suspect the design thinking was that your app would be at just one domain and so one global setting, set at startup, would be sufficient, so it was not necessary to make tld_length thread safe.

ActionDispatch is pretty central to Rails, so I would try to avoid mucking with it. How hard would it be to run 2 Puma servers and send all tld_length = 2 traffic to one server and tld_length = 1 to the other server? If you are running a server farm, that would be a reasonable sharding key and would keep you from having to do any further tricks.

If I had to run it in 1 server, I would look into modifying ActionDispatch::Http::URL so that it stores tld_length in a Thread-local variable instead of a module variable and set it on each request. You would also have to change the functions that use the module variable as a default value, like domain to use the thread variable as the default, which might be easiest by using an accessor function.

How to set tld_length dynamically on a Rails app with puma (thread safe)

Question

2 answers

solution1
1 2020-07-23 15:33:27

solution2
0 2020-07-25 00:49:59

How to set tld_length dynamically on a Rails app with puma (thread safe)

Question

2 answers

solution1 1 2020-07-23 15:33:27

solution2 0 2020-07-25 00:49:59

solution1
1 2020-07-23 15:33:27

solution2
0 2020-07-25 00:49:59